Earlier this week on a sunny Manchester morning I hosted our second ‘meet the devs’ GPFS user group session. In February we met in London and had a very open discussion with a short demo. This session had different attendees in a different city, with different food and a different focus.
Our focus was hands on testing of the upcoming 4.1.1 release and specifically the protocol related work done in the UK. After some minimal slideware all attendees were provided some hardware and a GPFS development engineer. This was following pizza and coffee, which meant everyone was energised and ready.
What did we all learn? You can’t plan for fire alarms and using laptops for a demo to run virtual clusters with no network can be challenging! Enough said.
The last session left me with pages of notes, but this time I feel strangely naked as I write. The focus on hands on testing left me with few personal notes, but my team of developers made some great notes to go away and fix or consider for future releases. Allow me to share some of our discoveries.
Install/Config: Having a new CLI toolkit is highly desirable and will save time – day to day maintenance and upgrade are much more important than first time install. One client would like to have more control over certain attributes like pagepoolsize and subnets during install. We should make the install configuration file a feature – effectively a cluster definition file. For upgrade it would be nice to see a dry run first so user is aware what it is going to do. Before we release the toolkit we should make it clear which commands are doing configuration actions opposed to action on the nodes themselves.
Performance/Monitoring: Most users who attended had an interest in performance and monitoring. The machine readable data for system states and queries on network/system/IO etc information was seen as a very positive step forward. What performance sensors are on by default? Can you generate me a heat map of what’s being used? Can you provide a meta data vs data query? Can you show performance vs load (e.g. what nodes are running slower than expected?)Can we have block and partial block write counter increments? Can we integrate monitoring and events with existing tools like TEAL?
General stuff: What will the release cycle look like in the future? What we are doing to test multi clusters? Sizing systems cannot be easily formulated! We are introducing the ability to throttle maintenance tasks – is there a way or can we add a way to prioritise one file system over another too.
This isn’t the place to answer all those questions that were thrown out there, but it should provide you some context.
It is always nice to be thanked for hosting a session, but the best testament was that people would come again and from Daniel who said something likeĀ “more of my team would have come if they’d known it was like this – they expected some slides about new things that might be coming in 2017”.
A big thank you to all the attendees including my team in Manchester who worked behind the scenes to have hardware ready for our hands on session. As the day ended I felt development was a little more agile and we all understood our users better. Throw in some pizza and coffee and I can honestly state it was one of my best work days for a long time.
Now I shall sign off and go read a more about Robin Hood, not because I love folklore but to better understand why some of our clients like it so much.
Hopefully see you in York.