On Thu, Feb 27, 2014 at 2:42 AM, Jiri Kosina <jkos...@suse.cz> wrote: > Whatever suits you best. To sum it up: > > - mlx4 is confirmed to have this problem, and we know how that problem > happens -- see the paragraph in the changelog explaining the dependency > between memory reclaim and allocation of TX ring > > - we have a work around which requires human interaction in order > to provide the information whether GFP_NOFS should be used or not > > - I can very well understand why Mellanox would see that as a hack, but if > more comprehensive fix is necessary, I'd expect those who understand > the code the best to come up with a solution/proposal. I'd assume that > you don't want to keep the code with known and easily triggerable > deadlock out there unfixed. > > - where I see the potential for layering violation in any 'general' > solution is that it's the filesystem that has to be "talking" to the > underlying netdevice, i.e. you'll have to make filesystem > netdevice-aware, right?
It's quite clear that this is a general problem with IPoIB connected mode on any IB device. In connected mode, a packet send can trigger establishing a new connection, which will allocate a new QP, which in particular will allocate memory for the QP in the low-level IB device driver. Currently I'm positive that every driver will do GFP_KERNEL allocations when allocating a QP (ehca does both a GFP_KERNEL kmem_cache allocation and vmalloc in internal_create_qp(), mlx5 and mthca are similar to mlx4 and qib does vmalloc() in qib_create_qp()). So this patch needs to be extended to the other 4 IB device drivers in the tree. Also, I don't think GFP_NOFS is enough -- it seems we need GFP_NOIO, since we could be swapping to a block device over iSCSI over IPoIB-CM, so even non-FS stuff could deadlock. I don't think it makes any sense to have a "do_not_deadlock" module parameter, especially one that defaults to "false." If this is the right thing to do, then we should just unconditionally do it. It does seem that only using GFP_NOIO when we really need to would be a very difficult problem--how can we carry information about whether a particular packet is involved in freeing memory through all the layers of, say, NFS, TCP, IPSEC, bonding, &c? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/