[gpfsug-discuss] Associating I/O operations with files/processes
Andreas Petzold (SCC)
andreas.petzold at kit.edu
Tue May 30 15:00:27 BST 2017
Hi Sven,
we are seeing FileBlockRandomReadFetchHandlerThread.
I'll let you know once I have more results
Thanks,
Andreas
On 05/30/2017 03:55 PM, Sven Oehme wrote:
> Hi,
>
> the very first thing to do would be to do a mmfsadm dump iohist instead
> of mmdiag --iohist one time (we actually add this info in next release
> to mmdiag --iohist) to see if the thread type will reveal something :
>
> 07:25:53.578522 W data 1:20260249600 8192 35.930
> 488076 181 C0A70D0A:59076980 cli 192.167.20.129 Prefetch
> WritebehindWorkerThread
> 07:25:53.632722 W data 1:20260257792 8192 45.179
> 627136 173 C0A70D0A:59076980 cli 192.167.20.129 Cleaner
> CleanBufferThread
> 07:25:53.662067 W data 2:20259815424 8192 45.612
> 992975086 40 C0A70D0A:59076985 cli 192.167.20.130 Prefetch
> WritebehindWorkerThread
> 07:25:53.734274 W data 1:19601858560 8 0.624
> 50237 0 C0A70D0A:59076980 cli 192.167.20.129 MBHandler
> *_DioHandlerThread_*
>
> if you see DioHandlerThread most likely somebody changed a openflag to
> use O_DIRECT . if you don't use that flag even the app does only 4k i/o
> which is inefficient GPFS will detect this and do prefetch writebehind
> in large blocks, as soon as you add O_DIRECT, we don't do this anymore
> to honor the hint and then every single request gets handled one by one.
>
> after that the next thing would be to run a very low level trace with
> just IO infos like :
>
> mmtracectl --start --trace=io --tracedev-write-mode=overwrite -N .
>
> this will start collection on the node you execute the command if you
> want to run it against a different node replace the dot at the end with
> the hostname .
> wait a few seconds and run
>
> mmtracectl --off
>
> you will get a message that the trace gets formated and a eventually a
> trace file .
> now grep for FIO and you get lines like this :
>
>
> 7.293293470 127182 TRACE_IO: FIO: write data tag 1670183 1 ioVecSize
> 64 1st buf 0x5C024940000 nsdId C0A71482:5872D94A da 2:51070828544
> nSectors 32768 err 0
>
> if you further reduce it to nSectors 8 you would focus only on your 4k
> writes you mentioned above.
>
> the key item in the line above you care about is tag 16... this is the
> inode number of your file.
> if you now do :
>
> cd /usr/lpp/mmfs/samples/util ; make
> then run (replace -i and filesystem path obviously)
>
> [root at fire01 util]# ./tsfindinode -i 1670183 /ibm/fs2-16m-09/
>
> and you get a hit like this :
>
> 1670183 0 /ibm/fs2-16m-09//shared/test-newbuf
>
> you now know the file that is being accessed in the I/O example above is
> /ibm/fs2-16m-09//shared/test-newbuf
>
> hope that helps.
>
> sven
>
>
>
>
> On Tue, May 30, 2017 at 6:12 AM Andreas Petzold (SCC)
> <andreas.petzold at kit.edu <mailto:andreas.petzold at kit.edu>> wrote:
>
> Hi John,
>
> iotop wasn't helpful. It seems to be overwhelmed by what is going on on
> the machine.
>
> Cheers,
>
> Andreas
>
> On 05/30/2017 02:28 PM, John Hearns wrote:
> > Andreas,
> > This is a stupid reply, but please bear with me.
> > Not exactly GPFS related, but I once managed an SGI CXFS
> (Clustered XFS filesystem) setup.
> > We also had a new application which did post-processing One of the
> users reported that a post-processing job would take about 30 minutes.
> > However when two or more of the same application were running the
> job would take several hours.
> >
> > We finally found that this slowdown was due to the IO size, the
> application was using the default size.
> > We only found this by stracing the application and spending hours
> staring at the trace...
> >
> > I am sure there are better tools for this, and I do hope you don’t
> have to strace every application.... really.
> > A good tool to get a general feel for IO pattersn is 'iotop'. It
> might help?
> >
> >
> >
> >
> > -----Original Message-----
> > From: gpfsug-discuss-bounces at spectrumscale.org
> <mailto:gpfsug-discuss-bounces at spectrumscale.org>
> [mailto:gpfsug-discuss-bounces at spectrumscale.org
> <mailto:gpfsug-discuss-bounces at spectrumscale.org>] On Behalf Of
> Andreas Petzold (SCC)
> > Sent: Tuesday, May 30, 2017 2:17 PM
> > To: gpfsug-discuss at spectrumscale.org
> <mailto:gpfsug-discuss at spectrumscale.org>
> > Subject: [gpfsug-discuss] Associating I/O operations with
> files/processes
> >
> > Dear group,
> >
> > first a quick introduction: at KIT we are running a 20+PB storage
> system with several large (1-9PB) file systems. We have a 14 node
> NSD server cluster and 5 small (~10 nodes) protocol node clusters
> which each mount one of the file systems. The protocol nodes run
> server software (dCache, xrootd) specific to our users which
> primarily are the LHC experiments at CERN. GPFS version is 4.2.2
> everywhere. All servers are connected via IB, while the protocol
> nodes communicate via Ethernet to their clients.
> >
> > Now let me describe the problem we are facing. Since a few days,
> one of the protocol nodes shows a very strange and as of yet
> unexplained I/O behaviour. Before we were usually seeing reads like
> this (iohist example from a well behaved node):
> >
> > 14:03:37.637526 R data 32:138835918848 8192 46.626
> cli 0A417D79:58E3B179 172.18.224.19
> > 14:03:37.660177 R data 18:12590325760 8192 25.498
> cli 0A4179AD:58E3AE66 172.18.224.14
> > 14:03:37.640660 R data 15:106365067264 8192 45.682
> cli 0A4179AD:58E3ADD7 172.18.224.14
> > 14:03:37.657006 R data 35:130482421760 8192 30.872
> cli 0A417DAD:58E3B266 172.18.224.21
> > 14:03:37.643908 R data 33:107847139328 8192 45.571
> cli 0A417DAD:58E3B206 172.18.224.21
> >
> > Since a few days we see this on the problematic node:
> >
> > 14:06:27.253537 R data 46:126258287872 8 15.474
> cli 0A4179AB:58E3AE54 172.18.224.13
> > 14:06:27.268626 R data 40:137280768624 8 0.395
> cli 0A4179AD:58E3ADE3 172.18.224.14
> > 14:06:27.269056 R data 46:56452781528 8 0.427
> cli 0A4179AB:58E3AE54 172.18.224.13
> > 14:06:27.269417 R data 47:97273159640 8 0.293
> cli 0A4179AD:58E3AE5A 172.18.224.14
> > 14:06:27.269293 R data 49:59102786168 8 0.425
> cli 0A4179AD:58E3AE72 172.18.224.14
> > 14:06:27.269531 R data 46:142387326944 8 0.340
> cli 0A4179AB:58E3AE54 172.18.224.13
> > 14:06:27.269377 R data 28:102988517096 8 0.554
> cli 0A417879:58E3AD08 172.18.224.10
> >
> > The number of read ops has gone up by O(1000) which is what one
> would expect when going from 8192 sector reads to 8 sector reads.
> >
> > We have already excluded problems of node itself so we are
> focusing on the applications running on the node. What we'd like to
> to is to associate the I/O requests either with files or specific
> processes running on the machine in order to be able to blame the
> correct application. Can somebody tell us, if this is possible and
> if now, if there are other ways to understand what application is
> causing this?
> >
> > Thanks,
> >
> > Andreas
> >
> > --
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org <http://spectrumscale.org>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
--
Karlsruhe Institute of Technology (KIT)
Steinbuch Centre for Computing (SCC)
Andreas Petzold
Hermann-von-Helmholtz-Platz 1, Building 449, Room 202
D-76344 Eggenstein-Leopoldshafen
Tel: +49 721 608 24916
Fax: +49 721 608 24972
Email: petzold at kit.edu
www.scc.kit.edu
KIT – The Research University in the Helmholtz Association
Since 2010, KIT has been certified as a family-friendly university.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5323 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170530/96d476e6/attachment-0002.bin>
More information about the gpfsug-discuss
mailing list