Jeff Squyres <jsquy...@cisco.com> writes: > We had a user-reported issue of some hangs that the IB vendors have > been unable to replicate in their respective labs. We *suspect* that > it may be an issue with the oob openib CPC, but that code is pretty > old and pretty mature, so all of us would be at least somewhat > surprised if that were the case. If anyone can reliably reproduce > this error, please let us know and/or give us access to your machines
We can reproduce it with IMB. We could provide access, but we'd have to negotiate with the owners of the relevant nodes to give you interactive access to them. Maybe Brock's would be more accessible? (If you contact me, I may not be able to respond for a few days.) > -- we have not closed this issue, Which issue? I couldn't find a relevant-looking one. > but are unable to move forward > because the customers who reported this issue switched to rdmacm and > moved on (i.e., we don't have access to their machines to test any > more). For what it's worth, I figured out why I couldn't see rdmacm, but adding ipoib would be a bit of a pain. -- Excuse the typping -- I have a broken wrist