Hi Bruce, Stephen, > > Hello, > > > > I have seen a case where a secondary DPDK process tries to map uio > > resource in which mmap() normally sends the corresponding virtual > > address as a hint address. However on some instances mmap() returns a > > virtual address that is not the hint address, and it result in > > rte_panic() and the secondary process goes defunct. > > > > This happens from time to time on an embedded device when > nr_hugepages is > > set to 128, but never when nr_hugepage is set to 256 on the same device. > My > > question is, if mmap() can find the correct memory regions when > > hugepages is set to 256, would it not require less resources (and > > therefore be more likely to > > pass) at a lower value such as 128? > > > > Any ideas what would cause this mmap() behavior at a lower nr_hugepage > > value? > > > > - Stephen > > Hi Stephen, > > That's a strange one! > I don't know for definite why this is happening, but here is one possible > theory. :-) > > It could be due to the size of the memory blocks that are getting mmapped. > When you use 256 pages, the blocks of memory getting mapped may well be > larger (depending on how fragmented in memory the 2MB pages are), and > so may be getting mapped at a higher set of address ranges where there is > more free memory. This set of address ranges is then free in the secondary > process and it is similarly able to map the memory. > With the 128 hugepages, you may be looking for smaller amounts of memory > and so the addresses get mapped in at a different spot in the virtual address > space, one that may be more heavily used. Then when the secondary > process tries to duplicate the mappings, it already has memory in that region > in use and the mapping fails. > In short - one theory is that having bigger blocks to map causes the memory > to be mapped to a different location in memory which is free from conflicts in > the secondary process. > > So, how to confirm or refute this, and generally debug this issue? > Well, in general we would need to look at the messages printed out at > startup in the primary process to see how big of blocks it is trying to map in > each case, and where they end up in the virtual address-space.
As I remember, OVDK project has had vaguely similar issues (only they were trying to map hugepages into the space that QEMU has already occupied). This resulted in us adding a --base-virtaddr EAL command-line flag that would specify the start virtual address where primary process would start mapping pages. I guess you can try that as well (just remember that it needs to be done in the primary process, because the secondary one just copies the mappings and succeeds or fails to do so). Best regards, Anatoly Burakov DPDK SW Engineer