[gpfsug-discuss] Updating a medium size cluster efficiently
Jonathan Buzzard
jonathan.buzzard at strath.ac.uk
Wed Apr 27 21:21:07 BST 2022
On 27/04/2022 14:19, Hannappel, Juergen wrote:
>
> Hi,
> we have a medium size gpfs client cluster (a few hundred nodes)
> and want to update the gpfs version in an efficient way in a
> rolling update, i.e.. update each node when it can be rebooted.
>
> Doing so via a slurm script when the node is drained jut before
> the reboot work only most of the time because in some cases even
> when the node is drained the file systems are still buly and can't be unmounted,
> so the update fails.
>
I assume this is because you have dead jobs?
The general trick is to submit a job as a special user that has sudo
privileges, that runs as the next job on every node. That way you don't
need to wait for the node to drain. Last "user" job on the node finishes
and then the "special" job runs. It does it magic and reboots the node.
Winner winner chicken dinner.
> Therefore I tried to trigger the update on the reboot, before gpfs starts.
> To do so I added a systemd service that is sheduled before the gpfs.service,
> which does a yum update (we run CentOs 7.9) but:
>
> In the postinstall script of gpfs.base the gpfs.service is disabled and re-enabled
> via systemctl, and systemd apparently get's that wrong, so that if
> the update really happens it afterwards will not start the gpfs.service.
>
> Does any one have a clever way how to do a rolling update that really works
> without maunually hunting after some per cent of machines that don't manage
> it on the first go?
>
What you could do to find the nodes that don't work is have the upgrade
script do an mmshutdown first before attempting the upgrade. Then check
it actually managed to shutdown and if it didn't then send an email to
an appropriate person saying there is an issue, before say putting the
node in drain.
The man page for mmshutdown says it has an exit code of zero on success,
and none zero of failure so should be trivial to script.
Being really clever I think you could then have the script submit a
second copy of itself to the node that again will run as the next job
and then reboot the node. That way when it comes back up it should be
able to unmount GPFS and install the upgrade as the reboot will have
cleared the issues that prevented the mmshutdown from working. You would
obviously need to trial this out.
If you are just looking to upgrade gpfs-gplbin and don't want to have it
being recompiled on every node, then there is a trick with systemd. What
you do is create /etc/systemd/system/gpfs.service.d/install-module.conf
with the following contents
[Service]
ExecStartPre=-/usr/bin/yum --assumeyes install gpfs.gplbin-%v
then everytime GPFS starts up it attempts to install the module for the
currently running kernel (the special magic %v). This presumes you have
a repository with the appropriate gpfs-gplbin RPM setup. Basically I
take a node out build the RPM, test it is working and then deploy.
I have a special RPM that installs the above local customization to the
GPFS serivce unit file.
JAB.
--
Jonathan A. Buzzard Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
More information about the gpfsug-discuss
mailing list