[gpfsug-discuss] Associating I/O operations with files/processes

Tue May 30 15:00:27 BST 2017

	Hi Sven,

we are seeing FileBlockRandomReadFetchHandlerThread.

I'll let you know once I have more results

	Thanks,

		Andreas

On 05/30/2017 03:55 PM, Sven Oehme wrote:
> Hi,
>
> the very first thing to do would be to do a mmfsadm dump iohist instead
> of mmdiag --iohist one time (we actually add this info in next release
> to mmdiag --iohist) to see if the thread type will reveal something :
>
> 07:25:53.578522  W        data    1:20260249600   8192   35.930
>  488076       181  C0A70D0A:59076980 cli  192.167.20.129 Prefetch
>  WritebehindWorkerThread
> 07:25:53.632722  W        data    1:20260257792   8192   45.179
>  627136       173  C0A70D0A:59076980 cli  192.167.20.129 Cleaner
> CleanBufferThread
> 07:25:53.662067  W        data    2:20259815424   8192   45.612
> 992975086        40  C0A70D0A:59076985 cli  192.167.20.130 Prefetch
>  WritebehindWorkerThread
> 07:25:53.734274  W        data    1:19601858560      8    0.624
> 50237         0  C0A70D0A:59076980 cli  192.167.20.129 MBHandler
> *_DioHandlerThread_*
>
> if you see DioHandlerThread most likely somebody changed a openflag to
> use O_DIRECT . if you don't use that flag even the app does only 4k i/o
> which is inefficient GPFS will detect this and do prefetch writebehind
> in large blocks, as soon as you add O_DIRECT, we don't do this anymore
> to honor the hint and then every single request gets handled one by one.
>
> after that the next thing would be to run a very low level trace with
> just IO infos like :
>
> mmtracectl --start --trace=io --tracedev-write-mode=overwrite  -N .
>
> this will start collection on  the node you execute the command if you
> want to run it against a different node replace the dot at the end with
> the hostname .
> wait a few seconds and run
>
> mmtracectl --off
>
> you will get a message that the trace gets formated and a eventually a
> trace file .
> now grep for FIO and you get lines like this :
>
>
>    7.293293470 127182 TRACE_IO: FIO: write data tag 1670183 1 ioVecSize
> 64 1st buf 0x5C024940000 nsdId C0A71482:5872D94A da 2:51070828544
> nSectors 32768 err 0
>
> if you further reduce it to nSectors 8 you would focus only on your 4k
> writes you mentioned above.
>
> the key item in the line above you care about is tag 16... this is the
> inode number of your file.
> if you now do :
>
> cd /usr/lpp/mmfs/samples/util ; make
> then run (replace -i and filesystem path obviously)
>
> [root at fire01 util]# ./tsfindinode -i 1670183 /ibm/fs2-16m-09/
>
> and you get a hit like this :
>
>    1670183               0      /ibm/fs2-16m-09//shared/test-newbuf
>
> you now know the file that is being accessed in the I/O example above is
>  /ibm/fs2-16m-09//shared/test-newbuf
>
> hope that helps.
>
> sven
>
>
>
>
> On Tue, May 30, 2017 at 6:12 AM Andreas Petzold (SCC)
> <andreas.petzold at kit.edu <mailto:andreas.petzold at kit.edu>> wrote:
>
>             Hi John,
>
>     iotop wasn't helpful. It seems to be overwhelmed by what is going on on
>     the machine.
>
>             Cheers,
>
>                     Andreas
>
>     On 05/30/2017 02:28 PM, John Hearns wrote:
>     > Andreas,
>     > This is a stupid reply, but please bear with me.
>     > Not exactly GPFS related, but I once managed an SGI CXFS
>     (Clustered XFS filesystem) setup.
>     > We also had a new application which did post-processing One of the
>     users reported that a post-processing job would take about 30 minutes.
>     > However when two or more of the same application were running the
>     job would take several hours.
>     >
>     > We finally found that this slowdown was due to the IO size, the
>     application was using the default size.
>     > We only found this by stracing the application and spending hours
>     staring at the trace...
>     >
>     > I am sure there are better tools for this, and I do hope you don’t
>     have to strace every application.... really.
>     > A good tool to get a general feel for IO pattersn is 'iotop'. It
>     might help?
>     >
>     >
>     >
>     >
>     > -----Original Message-----
>     > From: gpfsug-discuss-bounces at spectrumscale.org
>     <mailto:gpfsug-discuss-bounces at spectrumscale.org>
>     [mailto:gpfsug-discuss-bounces at spectrumscale.org
>     <mailto:gpfsug-discuss-bounces at spectrumscale.org>] On Behalf Of
>     Andreas Petzold (SCC)
>     > Sent: Tuesday, May 30, 2017 2:17 PM
>     > To: gpfsug-discuss at spectrumscale.org
>     <mailto:gpfsug-discuss at spectrumscale.org>
>     > Subject: [gpfsug-discuss] Associating I/O operations with
>     files/processes
>     >
>     > Dear group,
>     >
>     > first a quick introduction: at KIT we are running a 20+PB storage
>     system with several large (1-9PB) file systems. We have a 14 node
>     NSD server cluster and 5 small (~10 nodes) protocol node clusters
>     which each mount one of the file systems. The protocol nodes run
>     server software (dCache, xrootd) specific to our users which
>     primarily are the LHC experiments at CERN. GPFS version is 4.2.2
>     everywhere. All servers are connected via IB, while the protocol
>     nodes communicate via Ethernet to their clients.
>     >
>     > Now let me describe the problem we are facing. Since a few days,
>     one of the protocol nodes shows a very strange and as of yet
>     unexplained I/O behaviour. Before we were usually seeing reads like
>     this (iohist example from a well behaved node):
>     >
>     > 14:03:37.637526  R        data   32:138835918848  8192   46.626
>     cli  0A417D79:58E3B179    172.18.224.19
>     > 14:03:37.660177  R        data   18:12590325760   8192   25.498
>     cli  0A4179AD:58E3AE66    172.18.224.14
>     > 14:03:37.640660  R        data   15:106365067264  8192   45.682
>     cli  0A4179AD:58E3ADD7    172.18.224.14
>     > 14:03:37.657006  R        data   35:130482421760  8192   30.872
>     cli  0A417DAD:58E3B266    172.18.224.21
>     > 14:03:37.643908  R        data   33:107847139328  8192   45.571
>     cli  0A417DAD:58E3B206    172.18.224.21
>     >
>     > Since a few days we see this on the problematic node:
>     >
>     > 14:06:27.253537  R        data   46:126258287872     8   15.474
>     cli  0A4179AB:58E3AE54    172.18.224.13
>     > 14:06:27.268626  R        data   40:137280768624     8    0.395
>     cli  0A4179AD:58E3ADE3    172.18.224.14
>     > 14:06:27.269056  R        data   46:56452781528      8    0.427
>     cli  0A4179AB:58E3AE54    172.18.224.13
>     > 14:06:27.269417  R        data   47:97273159640      8    0.293
>     cli  0A4179AD:58E3AE5A    172.18.224.14
>     > 14:06:27.269293  R        data   49:59102786168      8    0.425
>     cli  0A4179AD:58E3AE72    172.18.224.14
>     > 14:06:27.269531  R        data   46:142387326944     8    0.340
>     cli  0A4179AB:58E3AE54    172.18.224.13
>     > 14:06:27.269377  R        data   28:102988517096     8    0.554
>     cli  0A417879:58E3AD08    172.18.224.10
>     >
>     > The number of read ops has gone up by O(1000) which is what one
>     would expect when going from 8192 sector reads to 8 sector reads.
>     >
>     > We have already excluded problems of node itself so we are
>     focusing on the applications running on the node. What we'd like to
>     to is to associate the I/O requests either with files or specific
>     processes running on the machine in order to be able to blame the
>     correct application. Can somebody tell us, if this is possible and
>     if now, if there are other ways to understand what application is
>     causing this?
>     >
>     > Thanks,
>     >
>     > Andreas
>     >
>     > --
>
>
>     _______________________________________________
>     gpfsug-discuss mailing list
>     gpfsug-discuss at spectrumscale.org <http://spectrumscale.org>
>     http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>

-- 

   Karlsruhe Institute of Technology (KIT)
   Steinbuch Centre for Computing (SCC)

   Andreas Petzold

   Hermann-von-Helmholtz-Platz 1, Building 449, Room 202
   D-76344 Eggenstein-Leopoldshafen

   Tel: +49 721 608 24916
   Fax: +49 721 608 24972
   Email: petzold at kit.edu
   www.scc.kit.edu

   KIT – The Research University in the Helmholtz Association

   Since 2010, KIT has been certified as a family-friendly university.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5323 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170530/96d476e6/attachment-0002.bin>