- This event has passed.
SSUG::Digital: 007 – Manage the lifecycle of your files using the policy engine
21st October 2020 @ 16:00 - 17:00 BST
This episode will provide a comprehensive introduction to the IBM Spectrum Scale policy engine. It highlights the underlying architecture and how policies are executed in a IBM Spectrum Scale cluster. This episode also discusses example rules and policies facilitating Information Lifecycle Management accompanied with practical tips.
References
- Whitepaper: IBM Spectrum Scale ILM and Archiving Policies – A practical Guide
- Spectrum Scale ILM policy examples and scripts
- Apache Tika
Q&A
Q: Which type of nodes participate in policy execution?
A: Depends on the nodes specified with the -N option of the mmapplypolicy command. If the -N option is not specified, then the command runs parallel instances of the policy code on the nodes that are specified by the defaultHelperNodes attribute of the mmchconfig command. If -N is specified then the command runs parallel instances on the nodes or node class specified with the -N option. For more information see the IBM Spectrum Scale knowledge center: https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.5/com.ibm.spectrum.scale.v5r05.doc/bl1adm_mmapplypolicy.htm
Q: Can I somehow identify the type of a file via the policy engine, e.g. via the magicbyte? Or do I have to rely on the file extension?
A: The policy engine does not allow access to the data – only the file’s metadata including extended attributes can be evaluated by the policy engine. To identify the type of a file with the policy engine an EXTERNAL LIST rule can be used along with an external script that determines the type of files.
Q: Will the external tool process the filelist in parallel on all nodes, which are used to generate the filelist?
A: Yes, if an external tool or interface script is defined in an EXTERNAL POOL rule then this script is executed on all nodes that are specified with the -N option of the mmapplypolicy command. This assumes that all node specified with the -N option have access to the interface script. If this is not the case, then the policy run fails. You can control the number of instances of the external tool pool with the option -m and the number of files passed to one instance of the external pool with the option -B of the mmapplypolicy command.
Q: Are they any limitations or recommendations around length of rules in policy files? For example, we have ~750 filesets we want to place data on a specific pool. Should we just have one rule, or many rules for this?
A: Placement policies will be stored in a single file. The challenge is not so much the length of the file but the number of placement rules contained in the policy files. Whenever a file is created the policy engine must walk through all rules to find a match. If there are many rules, this will delay the file creation. Therefore I recommend to keep the number of placement rules low. For example, you could organize the placement policies by storage pools. There is a limit of eight storage pools, thus this would lead to maximal eight placement rules. In each rule you can use the FILESET statement to specify multiple filesets to be placed on a pool.