On Wed, Aug 16, 2023 at 9:26 PM David Marchand <david.march...@redhat.com> wrote: > > On Wed, Aug 16, 2023 at 8:30 PM Patrick Robb <pr...@iol.unh.edu> wrote: > > On Wed, Aug 16, 2023 at 10:40 AM David Marchand <david.march...@redhat.com> > > wrote: > >> > >> Patrick, Bruce, > >> > >> If it was reported, I either missed it or forgot about it, sorry. > >> Can you (re)share the context? > >> > >> > >> > > >> > Does the test suite pass if the mlx5 driver is disabled in the build? > >> > That > >> > could confirm or refute the suspicion of where the issue is, and also > >> > provide a temporary workaround while this set is merged (possibly > >> > including > >> > support for disabling specific tests, as I suggested in my other email). > >> > >> Or disabling the driver as Bruce proposes. > > > > Okay, we ran the test with the mlx5 driver disabled, and it still fails. > > So, this might be more of an ARM architecture issue. Ruifeng, are you still > > seeing this on your test bed? > > > > @David you didn't miss anything, we had a unicast with ARM when setting up > > the new arm container runners for unit testing a few months back. Ruifeng > > also noticed the same issue and speculated about mlx5 memory leaks. He > > raised the possibility of disabling the mlx5 driver too, but that option > > isn't great since we want to have a uniform build process (as much as > > possible) for our unit testing. Anyways, now we know that that isn't > > relevant. I'll forward the thread to you in any case - let me know if you > > have any ideas. > > The mention of "memtest1" in the mails rings a bell. > I will need more detailed logs, or ideally an env where it is reproduced.
It is a "recurring" yet not so well known issue. This unit test fails if any part of the DPDK did not release all (hugepage backed) memory and associated hugepages before exiting. In your case here, there is a virtio-net device that the container tries to get its hands on because DPDK scans and probes all available resources by default (and fails to, in this case, but that's not important). Triggering this virtio-net probing makes ethdev allocate its shared memzone for port data, but nothing in ethdev releases the memzone when exiting. Fixing this could be tricky... as the current ethdev code is really vague around which locks protect what (if anything..). I think we hit this issue in the past, and avoided it by running the tests with dynamically linked DPDK binaries (and by doing this, avoid the net drivers get loaded). I can see that you are running the unit tests with a static binary in the report you sent. I think the default is shared mode, so I wonder what could be the reason why UNH builds with static here. In any case, could you have a try and switch to -Ddefault_library=shared (or remove forcing to static mode)? -- David Marchand