On Thu, 27 Feb 2014, Jiri Kosina wrote: > On Thu, 27 Feb 2014, Or Gerlitz wrote: > > > ipoib is coded over the verbs API (include/rdma/ib_verbs.h) --- so tracking > > the path from ipoib through the verbs api into mlx4 should be similar > > exercise > > as doing so for mlx5, but let's 1st treat the higher level elements involved > > with this patch. > > > > Can you shed some light why the problem happens only for NFS, and not for > > example with other IP/TCP storage protocols? > > > > For example, do you expect it to happen with iSCSI/TCP too? the Linux > > iSCSI initiator 1st open a TCP socket from user space to the target, > > next they do login exchange over this socket and later provide the > > socket to the kernel iscsi code to use as the back-end of a SCSI block > > device registered with the SCSI midlayer > > Frankly, no idea. There was a problem with swapping over NFS, as writeback > was deadlocked with memory reclaim (memory needs to be allocated so that > swap could be accessed to reclaim memory). That's fixed by allocating the > buffers from PF_MEMALLOC reserve, introduced by Mel's and Peter's patchset > back in 3.9 or so. Oh, and the same has been done for swapping over NBD, > btw. Maybe iSCSI needs similar treatment, maybe it has it already, I > haven't checked. We haven't seen a bugreport for that though. > > > > I don't think we have, and it indeed should be rather easy to add. The > > > more challenging part of the problem is where (and based on which > > > data) the flag would actually be set up on the netdevice so that it's > > > not horrible layering violation. > > > > I assume that in the same manner netdevices advertize features to the > > networking core, the core can provide them operating directives after > > they register themselves. > > Whatever suits you best. To sum it up: > > - mlx4 is confirmed to have this problem, and we know how that problem > happens -- see the paragraph in the changelog explaining the dependency > between memory reclaim and allocation of TX ring > > - we have a work around which requires human interaction in order > to provide the information whether GFP_NOFS should be used or not > > - I can very well understand why Mellanox would see that as a hack, but if > more comprehensive fix is necessary, I'd expect those who understand > the code the best to come up with a solution/proposal. I'd assume that > you don't want to keep the code with known and easily triggerable > deadlock out there unfixed. > > - where I see the potential for layering violation in any 'general' > solution is that it's the filesystem that has to be "talking" to the > underlying netdevice, i.e. you'll have to make filesystem > netdevice-aware, right?
Mellanox folks, do you have any plan how to proceed here please? Thanks, -- Jiri Kosina SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/