On Tue, Oct 30, 2018 at 10:11 AM Burakov, Anatoly <anatoly.bura...@intel.com> wrote:
> On 29-Oct-18 2:18 PM, Thomas Monjalon wrote: > > 29/10/2018 14:40, Alejandro Lucero: > >> On Mon, Oct 29, 2018 at 1:18 PM Yao, Lei A <lei.a....@intel.com> wrote: > >>> *From:* Alejandro Lucero [mailto:alejandro.luc...@netronome.com] > >>> On Mon, Oct 29, 2018 at 11:46 AM Thomas Monjalon <tho...@monjalon.net> > >>> wrote: > >>> > >>> 29/10/2018 12:39, Alejandro Lucero: > >>>> I got a patch that solves a bug when calling rte_eal_dma_mask using > the > >>>> mask instead of the maskbits. However, this does not solves the > >>> deadlock. > >>> > >>> The deadlock is a bigger concern I think. > >>> > >>> I think once the call to rte_eal_check_dma_mask uses the maskbits > instead > >>> of the mask, calling rte_memseg_walk_thread_unsafe avoids the deadlock. > >>> > >>> Yao, can you try with the attached patch? > >>> > >>> Hi, Lucero > >>> > >>> This patch can fix the issue at my side. Thanks a lot > >>> for you quick action. > >> > >> Great! > >> > >> I will send an official patch with the changes. > > > > Please, do not forget my other request to better comment functions. > > > > > >> I have to say that I tested the patchset, but I think it was where > >> legacy_mem was still there and therefore dynamic memory allocation code > not > >> used during memory initialization. > >> > >> There is something that concerns me though. Using > >> rte_memseg_walk_thread_unsafe could be a problem under some situations > >> although those situations being unlikely. > >> > >> Usually, calling rte_eal_check_dma_mask happens during initialization. > Then > >> it is safe to use the unsafe function for walking memsegs, but with > device > >> hotplug and dynamic memory allocation, there exists a potential race > >> condition when the primary process is allocating more memory and > >> concurrently a device is hotplugged and a secondary process does the > device > >> initialization. By now, this is just a problem with the NFP, and the > >> potential race condition window really unlikely, but I will work on this > >> asap. > > > > Yes, this is what concerns me. > > You can add a comment explaining the unsafe which is not handled. > > The issue here is that this code is called from both memory-locked and > memory-unlocked context. Virtio had a similar issue with their mem table > update code - they solved it by manually locking the memory before doing > everything else, and using thread_unsafe version of the walk. > > Could something like that be done here? > > I have a patch adding a safe and an unsafe dma mask check versions. However, because the multiprocess problem reported, I think the fixing requires other type of work. The problem I see now is calling rte_eal_check_dma_mask from set_iova_mode code path is wrong. This can not be done at that point because the memory has not been initialized yet. > > > > > >>>> Interestingly, the problem looks like a compiler one. Calling > >>>> rte_memseg_walk does not return when calling inside rt_eal_dma_mask, > >>> but if > >>>> you modify the call like this: > >>>> > >>>> - if (rte_memseg_walk(check_iova, &mask)) > >>>> + if (!rte_memseg_walk(check_iova, &mask)) > >>>> > >>>> it works, although the value returned to the invoker changes, of > course. > >>>> But the point here is it should be the same behaviour when calling > >>>> rte_memseg_walk than before and it is not. > >>> > >>> Anyway, the coding style requires to save the return value in a > variable, > >>> instead of nesting the call in an "if" condition. > >>> And the "if" check should be explicitly != 0 because it is not a real > >>> boolean. > >>> > >>> PS: please do not top post and avoid HTML emails, thanks > >>> > >>> > >> > > > > > > > > > > > > > > > -- > Thanks, > Anatoly >