[gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors
Jaime Pinto
pinto at scinet.utoronto.ca
Thu May 18 20:02:46 BST 2017
Ok Mark
I'll follow your option 2) suggestion, and capture what mmbackup is
using as a rule first, then modify it.
I imagine by 'capture' you are referring to the -L n level I use?
-L n
Controls the level of information displayed by the
mmbackup command. Larger values indicate the
display of more detailed information. n should be one of
the following values:
3
Displays the same information as 2, plus each
candidate file and the applicable rule.
4
Displays the same information as 3, plus each
explicitly EXCLUDEed or LISTed
file, and the applicable rule.
5
Displays the same information as 4, plus the
attributes of candidate and EXCLUDEed or
LISTed files.
6
Displays the same information as 5, plus
non-candidate files and their attributes.
Thanks
Jaime
Quoting "Marc A Kaplan" <makaplan at us.ibm.com>:
> 1. As I surmised, and I now have verification from Mr. mmbackup, mmbackup
> wants to support incremental backups (using what it calls its shadow
> database) and keep both your sanity and its sanity -- so mmbackup limits
> you to either full filesystem or full inode-space (independent fileset.)
> If you want to do something else, okay, but you have to be careful and be
> sure of yourself. IBM will not be able to jump in and help you if and when
> it comes time to restore and you discover that your backup(s) were not
> complete.
>
> 2. If you decide you're a big boy (or woman or XXX) and want to do some
> hacking ... Fine... But even then, I suggest you do the smallest hack
> that will mostly achieve your goal...
> DO NOT think you can create a custom policy rules list for mmbackup out of
> thin air.... Capture the rules mmbackup creates and make small changes to
> that --
> And as with any disaster recovery plan..... Plan your Test and Test your
> Plan.... Then do some dry run recoveries before you really "need" to do a
> real recovery.
>
> I only even sugest this because Jaime says he has a huge filesystem with
> several dependent filesets and he really, really wants to do a partial
> backup, without first copying or re-organizing the filesets.
>
> HMMM.... otoh... if you have one or more dependent filesets that are
> smallish, and/or you don't need the backups -- create independent
> filesets, copy/move/delete the data, rename, voila.
>
>
>
> From: "Jaime Pinto" <pinto at scinet.utoronto.ca>
> To: "Marc A Kaplan" <makaplan at us.ibm.com>
> Cc: "gpfsug main discussion list" <gpfsug-discuss at spectrumscale.org>
> Date: 05/18/2017 12:36 PM
> Subject: Re: [gpfsug-discuss] What is an independent fileset? was:
> mmbackup with fileset : scope errors
>
>
>
> Marc
>
> The -P option may be a very good workaround, but I still have to test it.
>
> I'm currently trying to craft the mm rule, as minimalist as possible,
> however I'm not sure about what attributes mmbackup expects to see.
>
> Below is my first attempt. It would be nice to get comments from
> somebody familiar with the inner works of mmbackup.
>
> Thanks
> Jaime
>
>
> /* A macro to abbreviate VARCHAR */
> define([vc],[VARCHAR($1)])
>
> /* Define three external lists */
> RULE EXTERNAL LIST 'allfiles' EXEC
> '/scratch/r/root/mmpolicyRules/mmpolicyExec-list'
>
> /* Generate a list of all files, directories, plus all other file
> system objects,
> like symlinks, named pipes, etc. Include the owner's id with each
> object and
> sort them by the owner's id */
>
> RULE 'r1' LIST 'allfiles'
> DIRECTORIES_PLUS
> SHOW('-u' vc(USER_ID) || ' -a' || vc(ACCESS_TIME) || ' -m' ||
> vc(MODIFICATION_TIME) || ' -s ' || vc(FILE_SIZE))
> FROM POOL 'system'
> FOR FILESET('sysadmin3')
>
> /* Files in special filesets, such as those excluded, are never traversed
> */
> RULE 'ExcSpecialFile' EXCLUDE
> FOR FILESET('scratch3','project3')
>
>
>
>
>
> Quoting "Marc A Kaplan" <makaplan at us.ibm.com>:
>
>> Jaime,
>>
>> While we're waiting for the mmbackup expert to weigh in, notice that
> the
>> mmbackup command does have a -P option that allows you to provide a
>> customized policy rules file.
>>
>> So... a fairly safe hack is to do a trial mmbackup run, capture the
>> automatically generated policy file, and then augment it with FOR
>> FILESET('fileset-I-want-to-backup') clauses.... Then run the mmbackup
> for
>> real with your customized policy file.
>>
>> mmbackup uses mmapplypolicy which by itself is happy to limit its
>> directory scan to a particular fileset by using
>>
>> mmapplypolicy /path-to-any-directory-within-a-gpfs-filesystem --scope
>> fileset ....
>>
>> However, mmbackup probably has other worries and for simpliciity and
>> helping make sure you get complete, sensible backups, apparently has
>> imposed some restrictions to preserve sanity (yours and our support
> team!
>> ;-) ) ... (For example, suppose you were doing incremental backups,
>> starting at different paths each time? -- happy to do so, but when
>> disaster strikes and you want to restore -- you'll end up confused
> and/or
>> unhappy!)
>>
>> "converting from one fileset to another" --- sorry there is no such
> thing.
>> Filesets are kinda like little filesystems within filesystems. Moving
> a
>> file from one fileset to another requires a copy operation. There is
> no
>> fast move nor hardlinking.
>>
>> --marc
>>
>>
>>
>> From: "Jaime Pinto" <pinto at scinet.utoronto.ca>
>> To: "gpfsug main discussion list"
> <gpfsug-discuss at spectrumscale.org>,
>> "Marc A Kaplan" <makaplan at us.ibm.com>
>> Date: 05/18/2017 09:58 AM
>> Subject: Re: [gpfsug-discuss] What is an independent fileset?
> was:
>> mmbackup with fileset : scope errors
>>
>>
>>
>> Thanks for the explanation Mark and Luis,
>>
>> It begs the question: why filesets are created as dependent by
>> default, if the adverse repercussions can be so great afterward? Even
>> in my case, where I manage GPFS and TSM deployments (and I have been
>> around for a while), didn't realize at all that not adding and extra
>> option at fileset creation time would cause me huge trouble with
>> scaling later on as I try to use mmbackup.
>>
>> When you have different groups to manage file systems and backups that
>> don't read each-other's manuals ahead of time then we have a really
>> bad recipe.
>>
>> I'm looking forward to your explanation as to why mmbackup cares one
>> way or another.
>>
>> I'm also hoping for a hint as to how to configure backup exclusion
>> rules on the TSM side to exclude fileset traversing on the GPFS side.
>> Is mmbackup smart enough (actually smarter than TSM client itself) to
>> read the exclusion rules on the TSM configuration and apply them
>> before traversing?
>>
>> Thanks
>> Jaime
>>
>> Quoting "Marc A Kaplan" <makaplan at us.ibm.com>:
>>
>>> When I see "independent fileset" (in Spectrum/GPFS/Scale) I always
>> think
>>> and try to read that as "inode space".
>>>
>>> An "independent fileset" has all the attributes of an (older-fashioned)
>>> dependent fileset PLUS all of its files are represented by inodes that
>> are
>>> in a separable range of inode numbers - this allows GPFS to efficiently
>> do
>>> snapshots of just that inode-space (uh... independent fileset)...
>>>
>>> And... of course the files of dependent filesets must also be
>> represented
>>> by inodes -- those inode numbers are within the inode-space of whatever
>>> the containing independent fileset is... as was chosen when you created
>>> the fileset.... If you didn't say otherwise, inodes come from the
>>> default "root" fileset....
>>>
>>> Clear as your bath-water, no?
>>>
>>> So why does mmbackup care one way or another ??? Stay tuned....
>>>
>>> BTW - if you look at the bits of the inode numbers carefully --- you
> may
>>> not immediately discern what I mean by a "separable range of inode
>>> numbers" -- (very technical hint) you may need to permute the bit order
>>> before you discern a simple pattern...
>>>
>>>
>>>
>>> From: "Luis Bolinches" <luis.bolinches at fi.ibm.com>
>>> To: gpfsug-discuss at spectrumscale.org
>>> Cc: gpfsug-discuss at spectrumscale.org
>>> Date: 05/18/2017 02:10 AM
>>> Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope
>> errors
>>> Sent by: gpfsug-discuss-bounces at spectrumscale.org
>>>
>>>
>>>
>>> Hi
>>>
>>> There is no direct way to convert the one fileset that is dependent to
>>> independent or viceversa.
>>>
>>> I would suggest to take a look to chapter 5 of the 2014 redbook, lots
> of
>>> definitions about GPFS ILM including filesets
>>> http://www.redbooks.ibm.com/abstracts/sg248254.html?Open Is not the
> only
>>> place that is explained but I honestly believe is a good single start
>>> point. It also needs an update as does nto have anything on CES nor
> ESS,
>>> so anyone in this list feel free to give feedback on that page people
>> with
>>> funding decisions listen there.
>>>
>>> So you are limited to either migrate the data from that fileset to a
> new
>>> independent fileset (multiple ways to do that) or use the TSM client
>>> config.
>>>
>>> ----- Original message -----
>>> From: "Jaime Pinto" <pinto at scinet.utoronto.ca>
>>> Sent by: gpfsug-discuss-bounces at spectrumscale.org
>>> To: "gpfsug main discussion list" <gpfsug-discuss at spectrumscale.org>,
>>> "Jaime Pinto" <pinto at scinet.utoronto.ca>
>>> Cc:
>>> Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors
>>> Date: Thu, May 18, 2017 4:43 AM
>>>
>>> There is hope. See reference link below:
>>>
>>
> https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_tsm_fsvsfset.htm
>
>>
>>>
>>>
>>> The issue has to do with dependent vs. independent filesets, something
>>> I didn't even realize existed until now. Our filesets are dependent
>>> (for no particular reason), so I have to find a way to turn them into
>>> independent.
>>>
>>> The proper option syntax is "--scope inodespace", and the error
>>> message actually flagged that out, however I didn't know how to
>>> interpret what I saw:
>>>
>>>
>>> # mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm
>>> --scope inodespace --tsm-errorlog $logfile -L 2
>>> --------------------------------------------------------
>>> mmbackup: Backup of /gpfs/sgfs1/sysadmin3 begins at Wed May 17
>>> 21:27:43 EDT 2017.
>>> --------------------------------------------------------
>>> Wed May 17 21:27:45 2017 mmbackup:mmbackup: Backing up *dependent*
>>> fileset sysadmin3 is not supported
>>> Wed May 17 21:27:45 2017 mmbackup:This fileset is not suitable for
>>> fileset level backup. exit 1
>>> --------------------------------------------------------
>>>
>>> Will post the outcome.
>>> Jaime
>>>
>>>
>>>
>>> Quoting "Jaime Pinto" <pinto at scinet.utoronto.ca>:
>>>
>>>> Quoting "Luis Bolinches" <luis.bolinches at fi.ibm.com>:
>>>>
>>>>> Hi
>>>>>
>>>>> have you tried to add exceptions on the TSM client config file?
>>>>
>>>> Hey Luis,
>>>>
>>>> That would work as well (mechanically), however it's not elegant or
>>>> efficient. When you have over 1PB and 200M files on scratch it will
>>>> take many hours and several helper nodes to traverse that fileset just
>>>> to be negated by TSM. In fact exclusion on TSM are just as
> inefficient.
>>>> Considering that I want to keep project and sysadmin on different
>>>> domains then it's much worst, since we have to traverse and exclude
>>>> scratch & (project|sysadmin) twice, once to capture sysadmin and again
>>>> to capture project.
>>>>
>>>> If I have to use exclusion rules it has to rely sole on gpfs rules,
> and
>>>> somehow not traverse scratch at all.
>>>>
>>>> I suspect there is a way to do this properly, however the examples on
>>>> the gpfs guide and other references are not exhaustive. They only show
>>>> a couple of trivial cases.
>>>>
>>>> However my situation is not unique. I suspect there are may facilities
>>>> having to deal with backup of HUGE filesets.
>>>>
>>>> So the search is on.
>>>>
>>>> Thanks
>>>> Jaime
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>> Assuming your GPFS dir is /IBM/GPFS and your fileset to exclude is
>>> linked
>>>>> on /IBM/GPFS/FSET1
>>>>>
>>>>> dsm.sys
>>>>> ...
>>>>>
>>>>> DOMAIN /IBM/GPFS
>>>>> EXCLUDE.DIR /IBM/GPFS/FSET1
>>>>>
>>>>>
>>>>> From: "Jaime Pinto" <pinto at scinet.utoronto.ca>
>>>>> To: "gpfsug main discussion list"
>>> <gpfsug-discuss at spectrumscale.org>
>>>>> Date: 17-05-17 23:44
>>>>> Subject: [gpfsug-discuss] mmbackup with fileset : scope errors
>>>>> Sent by: gpfsug-discuss-bounces at spectrumscale.org
>>>>>
>>>>>
>>>>>
>>>>> I have a g200 /gpfs/sgfs1 filesystem with 3 filesets:
>>>>> * project3
>>>>> * scratch3
>>>>> * sysadmin3
>>>>>
>>>>> I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we
>>>>> have no need or space to include *scratch3* on TSM.
>>>>>
>>>>> Question: how to craft the mmbackup command to backup
>>>>> /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only?
>>>>>
>>>>> Below are 3 types of errors:
>>>>>
>>>>> 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm
>>>>> --tsm-errorlog $logfile -L 2
>>>>>
>>>>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem
>>>>> cannot be specified at the same time.
>>>>>
>>>>> 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm
>>>>> --scope inodespace --tsm-errorlog $logfile -L 2
>>>>>
>>>>> ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up
>>>>> dependent fileset sysadmin3 is not supported
>>>>> Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for
>>>>> fileset level backup. exit 1
>>>>>
>>>>> 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm
>>>>> --scope filesystem --tsm-errorlog $logfile -L 2
>>>>>
>>>>> ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem
>>>>> cannot be specified at the same time.
>>>>>
>>>>> These examples don't really cover my case:
>>>>>
>>>
>>
> https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples
>
>>
>>>
>>>>>
>>>>>
>>>>> Thanks
>>>>> Jaime
>>>>>
>>>>>
>>>>> ************************************
>>>>> TELL US ABOUT YOUR SUCCESS STORIES
>>>>> http://www.scinethpc.ca/testimonials
>>>>> ************************************
>>>>> ---
>>>>> Jaime Pinto
>>>>> SciNet HPC Consortium - Compute/Calcul Canada
>>>>> www.scinet.utoronto.ca - www.computecanada.ca
>>>>> University of Toronto
>>>>> 661 University Ave. (MaRS), Suite 1140
>>>>> Toronto, ON, M5G1M1
>>>>> P: 416-978-2755
>>>>> C: 416-505-1477
>>>>>
>>>>> ----------------------------------------------------------------
>>>>> This message was sent using IMP at SciNet Consortium, University of
>>>>> Toronto.
>>>>>
>>>>> _______________________________________________
>>>>> gpfsug-discuss mailing list
>>>>> gpfsug-discuss at spectrumscale.org
>>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Ellei edellä ole toisin mainittu: / Unless stated otherwise above:
>>>>> Oy IBM Finland Ab
>>>>> PL 265, 00101 Helsinki, Finland
>>>>> Business ID, Y-tunnus: 0195876-3
>>>>> Registered in Finland
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ************************************
>>>> TELL US ABOUT YOUR SUCCESS STORIES
>>>> http://www.scinethpc.ca/testimonials
>>>> ************************************
>>>> ---
>>>> Jaime Pinto
>>>> SciNet HPC Consortium - Compute/Calcul Canada
>>>> www.scinet.utoronto.ca - www.computecanada.ca
>>>> University of Toronto
>>>> 661 University Ave. (MaRS), Suite 1140
>>>> Toronto, ON, M5G1M1
>>>> P: 416-978-2755
>>>> C: 416-505-1477
>>>>
>>>> ----------------------------------------------------------------
>>>> This message was sent using IMP at SciNet Consortium, University of
>>> Toronto.
>>>>
>>>> _______________________________________________
>>>> gpfsug-discuss mailing list
>>>> gpfsug-discuss at spectrumscale.org
>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> ************************************
>>> TELL US ABOUT YOUR SUCCESS STORIES
>>> http://www.scinethpc.ca/testimonials
>>> ************************************
>>> ---
>>> Jaime Pinto
>>> SciNet HPC Consortium - Compute/Calcul Canada
>>> www.scinet.utoronto.ca - www.computecanada.ca
>>> University of Toronto
>>> 661 University Ave. (MaRS), Suite 1140
>>> Toronto, ON, M5G1M1
>>> P: 416-978-2755
>>> C: 416-505-1477
>>>
>>> ----------------------------------------------------------------
>>> This message was sent using IMP at SciNet Consortium, University of
>>> Toronto.
>>>
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at spectrumscale.org
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>
>>>
>>>
>>> Ellei edellä ole toisin mainittu: / Unless stated otherwise above:
>>> Oy IBM Finland Ab
>>> PL 265, 00101 Helsinki, Finland
>>> Business ID, Y-tunnus: 0195876-3
>>> Registered in Finland
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at spectrumscale.org
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>>
>>
>> ************************************
>> TELL US ABOUT YOUR SUCCESS STORIES
>> http://www.scinethpc.ca/testimonials
>> ************************************
>> ---
>> Jaime Pinto
>> SciNet HPC Consortium - Compute/Calcul Canada
>> www.scinet.utoronto.ca - www.computecanada.ca
>> University of Toronto
>> 661 University Ave. (MaRS), Suite 1140
>> Toronto, ON, M5G1M1
>> P: 416-978-2755
>> C: 416-505-1477
>>
>> ----------------------------------------------------------------
>> This message was sent using IMP at SciNet Consortium, University of
>> Toronto.
>>
>>
>>
>>
>>
>>
>
>
>
>
>
>
> ************************************
> TELL US ABOUT YOUR SUCCESS STORIES
> http://www.scinethpc.ca/testimonials
> ************************************
> ---
> Jaime Pinto
> SciNet HPC Consortium - Compute/Calcul Canada
> www.scinet.utoronto.ca - www.computecanada.ca
> University of Toronto
> 661 University Ave. (MaRS), Suite 1140
> Toronto, ON, M5G1M1
> P: 416-978-2755
> C: 416-505-1477
>
> ----------------------------------------------------------------
> This message was sent using IMP at SciNet Consortium, University of
> Toronto.
>
>
>
>
>
>
************************************
TELL US ABOUT YOUR SUCCESS STORIES
http://www.scinethpc.ca/testimonials
************************************
---
Jaime Pinto
SciNet HPC Consortium - Compute/Calcul Canada
www.scinet.utoronto.ca - www.computecanada.ca
University of Toronto
661 University Ave. (MaRS), Suite 1140
Toronto, ON, M5G1M1
P: 416-978-2755
C: 416-505-1477
----------------------------------------------------------------
This message was sent using IMP at SciNet Consortium, University of Toronto.
More information about the gpfsug-discuss
mailing list