[gpfsug-discuss] VERBS RDMA issue
Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]
aaron.s.knister at nasa.gov
Sun May 21 09:59:38 BST 2017
Hi Tushar,
For me the issue was an underlying performance bottleneck (some CPU frequency scaling problems causing cores to throttle back when it wasn't appropriate).
I noticed you have verbsRdmaSend set to yes. I've seen suggestions in the past to turn this off under certain conditions although I don't remember what those where. Hopefully others can chime in and qualify that.
Are you seeing any RDMA errors in your logs? (e.g. grep IBV_ out of the mmfs.log).
-Aaron
On May 21, 2017 at 04:41:00 EDT, Tushar Pathare <tpathare at sidra.org> wrote:
Hello Team,
We are facing a lot of messages waiters related to waiting for conn rdmas < conn maxrdmas<https://www.mail-archive.com/search?l=gpfsug-discuss@spectrumscale.org&q=subject:%22Re%5C%3A+%5C%5Bgpfsug%5C-discuss%5C%5D+waiting+for+conn+rdmas+%3C+conn+maxrdmas%22&o=newest>
Is there some recommended settings to resolve this issue.?
Our config for RDMA is as follows for 140 nodes(32 cores each)
VERBS RDMA Configuration:
Status : started
Start time : Thu
Stats reset time : Thu
Dump time : Sun
mmfs verbsRdma : enable
mmfs verbsRdmaCm : disable
mmfs verbsPorts : mlx4_0/1 mlx4_0/2
mmfs verbsRdmasPerNode : 3200
mmfs verbsRdmasPerNode (max) : 3200
mmfs verbsRdmasPerNodeOptimize : yes
mmfs verbsRdmasPerConnection : 16
mmfs verbsRdmasPerConnection (max) : 16
mmfs verbsRdmaMinBytes : 16384
mmfs verbsRdmaRoCEToS : -1
mmfs verbsRdmaQpRtrMinRnrTimer : 18
mmfs verbsRdmaQpRtrPathMtu : 2048
mmfs verbsRdmaQpRtrSl : 0
mmfs verbsRdmaQpRtrSlDynamic : no
mmfs verbsRdmaQpRtrSlDynamicTimeout : 10
mmfs verbsRdmaQpRtsRnrRetry : 6
mmfs verbsRdmaQpRtsRetryCnt : 6
mmfs verbsRdmaQpRtsTimeout : 18
mmfs verbsRdmaMaxSendBytes : 16777216
mmfs verbsRdmaMaxSendSge : 27
mmfs verbsRdmaSend : yes
mmfs verbsRdmaSerializeRecv : no
mmfs verbsRdmaSerializeSend : no
mmfs verbsRdmaUseMultiCqThreads : yes
mmfs verbsSendBufferMemoryMB : 1024
mmfs verbsLibName : libibverbs.so
mmfs verbsRdmaCmLibName : librdmacm.so
mmfs verbsRdmaMaxReconnectInterval : 60
mmfs verbsRdmaMaxReconnectRetries : -1
mmfs verbsRdmaReconnectAction : disable
mmfs verbsRdmaReconnectThreads : 32
mmfs verbsHungRdmaTimeout : 90
ibv_fork_support : true
Max connections : 196608
Max RDMA size : 16777216
Target number of vsend buffs : 16384
Initial vsend buffs per conn : 59
nQPs : 140
nCQs : 282
nCMIDs : 0
nDtoThreads : 2
nextIndex : 141
Number of Devices opened : 1
Device : mlx4_0
vendor_id : 713
Device vendor_part_id : 4099
Device mem register chunk : 8589934592 (0x200000000)
Device max_sge : 32
Adjusted max_sge : 0
Adjusted max_sge vsend : 30
Device max_qp_wr : 16351
Device max_qp_rd_atom : 16
Open Connect Ports : 1
verbsConnectPorts[0] : mlx4_0/1/0
lid : 129
state : IBV_PORT_ACTIVE
path_mtu : 2048
interface ID : 0xe41d2d030073b9d1
sendChannel.ib_channel : 0x7FA6CB816200
sendChannel.dtoThreadP : 0x7FA6CB821870
sendChannel.dtoThreadId : 12540
sendChannel.nFreeCq : 1
recvChannel.ib_channel : 0x7FA6CB81D590
recvChannel.dtoThreadP : 0x7FA6CB822BA0
recvChannel.dtoThreadId : 12541
recvChannel.nFreeCq : 1
ibv_cq : 0x7FA2724C81F8
ibv_cq.cqP : 0x0
ibv_cq.nEvents : 0
ibv_cq.contextP : 0x0
ibv_cq.ib_channel : 0x0
Thanks
Tushar B Pathare MBA IT,BE IT
Bigdata & GPFS
Software Development & Databases
Scientific Computing
Bioinformatics Division
Research
"What ever the mind of man can conceive and believe, drill can query"
Sidra Medical and Research Centre
Sidra OPC Building
Sidra Medical & Research Center
PO Box 26999
Al Luqta Street
Education City North Campus
Qatar Foundation, Doha, Qatar
Office 4003 3333 ext 37443 | M +974 74793547
tpathare at sidra.org<mailto:tpathare at sidra.org> | www.sidra.org<http://www.sidra.org/>
Disclaimer: This email and its attachments may be confidential and are intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient, any reading, printing, storage, disclosure, copying or any other action taken in respect of this e-mail is prohibited and may be unlawful. If you are not the intended recipient, please notify the sender immediately by using the reply function and then permanently delete what you have received. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Sidra Medical and Research Center.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170521/ae44468a/attachment-0002.htm>
More information about the gpfsug-discuss
mailing list