Hello Anatoly, > -----Original Message----- > From: Burakov, Anatoly [mailto:anatoly.bura...@intel.com] > Sent: Wednesday, March 21, 2018 8:18 PM > To: Shreyansh Jain <shreyansh.j...@nxp.com> > Cc: dev@dpdk.org; Hemant Agrawal <hemant.agra...@nxp.com> > Subject: Re: [dpdk-dev] [PATCH v2 00/41] Memory Hotplug for DPDK >
[...] > >> > > > > While working on issue reported in [1], I have found another issue > > which I might need you help. > > > > [1] > https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdpdk.o > rg%2Fml%2Farchives%2Fdev%2F2018- > March%2F093202.html&data=02%7C01%7Cshreyansh.jain%40nxp.com%7C5faee716e6 > fc4908bdb608d58f3ad1e5%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C6365 > 72405182868376&sdata=WohDdktHHAuNDnss1atuixSa%2FqC7HRMSDVCtFC9Vnto%3D&re > served=0 > > > > For [1], I bypassed by changing the mempool_add_elem code for time > > being - it now allows non-contiguous (not explicitly demanded > > contiguous) allocations to go through rte_mempool_populate_iova. With > > that, I was able to get DPAA2 working. > > > > Problem is: > > 1. When I am working with 1GB pages, I/O is working fine. > > 2. When using 2MB pages (1024 num), the initialization somewhere after > > VFIO layer fails. > > > > All with IOVA=VA mode. > > > > Some logs: > > > > This is the output of the virtual memory layout demanded by DPDK: > > > > --->8--- > > EAL: Ask a virtual area of 0x2e000 bytes > > EAL: Virtual area found at 0xffffb6561000 (size = 0x2e000) > > EAL: Setting up physically contiguous memory... > > EAL: Ask a virtual area of 0x59000 bytes > > EAL: Virtual area found at 0xffffb6508000 (size = 0x59000) > > EAL: Memseg list allocated: 0x800kB at socket 0 > > EAL: Ask a virtual area of 0x400000000 bytes > > EAL: Virtual area found at 0xfffbb6400000 (size = 0x400000000) > > EAL: Ask a virtual area of 0x59000 bytes > > EAL: Virtual area found at 0xfffbb62af000 (size = 0x59000) > > EAL: Memseg list allocated: 0x800kB at socket 0 > > EAL: Ask a virtual area of 0x400000000 bytes > > EAL: Virtual area found at 0xfff7b6200000 (size = 0x400000000) > > EAL: Ask a virtual area of 0x59000 bytes > > EAL: Virtual area found at 0xfff7b6056000 (size = 0x59000) > > EAL: Memseg list allocated: 0x800kB at socket 0 > > EAL: Ask a virtual area of 0x400000000 bytes > > EAL: Virtual area found at 0xfff3b6000000 (size = 0x400000000) > > EAL: Ask a virtual area of 0x59000 bytes > > EAL: Virtual area found at 0xfff3b5dfd000 (size = 0x59000) > > EAL: Memseg list allocated: 0x800kB at socket 0 > > EAL: Ask a virtual area of 0x400000000 bytes > > EAL: Virtual area found at 0xffefb5c00000 (size = 0x400000000) > > --->8--- > > > > Then, somehow VFIO mapping is able to find only a single page to map > > > > --->8--- > > EAL: Device (dpci.1) abstracted from VFIO > > EAL: -->Initial SHM Virtual ADDR FFFBB6400000 > > EAL: -----> DMA size 0x200000 > > EAL: Total 1 segments found. > > --->8--- > > > > Then, these logs appear probably when DPAA2 code requests for memory. > > I am not sure why it repeats the same '...expanded by 10MB'. > > > > --->8--- > > EAL: Calling mem event callback vfio_mem_event_clbEAL: request: > mp_malloc_sync > > EAL: Heap on socket 0 was expanded by 10MB > > EAL: Calling mem event callback vfio_mem_event_clbEAL: request: > mp_malloc_sync > > EAL: Heap on socket 0 was expanded by 10MB > > EAL: Calling mem event callback vfio_mem_event_clbEAL: request: > mp_malloc_sync > > EAL: Heap on socket 0 was expanded by 10MB > > EAL: Calling mem event callback vfio_mem_event_clbEAL: request: > mp_malloc_sync > > EAL: Heap on socket 0 was expanded by 10MB > > EAL: Calling mem event callback vfio_mem_event_clbEAL: request: > mp_malloc_sync > > EAL: Heap on socket 0 was expanded by 10MB > > EAL: Calling mem event callback vfio_mem_event_clbEAL: request: > mp_malloc_sync > > EAL: Heap on socket 0 was expanded by 10MB > > EAL: Calling mem event callback vfio_mem_event_clbEAL: request: > mp_malloc_sync > > EAL: Heap on socket 0 was expanded by 2MB > > EAL: Calling mem event callback vfio_mem_event_clbEAL: request: > mp_malloc_sync > > EAL: Heap on socket 0 was expanded by 10MB > > EAL: Calling mem event callback vfio_mem_event_clbEAL: request: > mp_malloc_sync > > EAL: Heap on socket 0 was expanded by 10MB > > LPM or EM none selected, default LPM on > > Initializing port 0 ... > > --->8--- > > > > l3fwd is stuck at this point. What I observe is that DPAA2 driver has > > gone ahead to register the queues (queue_setup) with hardware and the > > memory has either overrun (smaller than requested size mapped) or the > > addresses are corrupt (that is, not dma-able). (I get SMMU faults, > > indicating one of these cases) > > > > There is some change from you in the fslmc/fslmc_vfio.c file > > (rte_fslmc_vfio_dmamap()). Ideally, that code should have walked over > > all the available pages for mapping but that didn't happen and only a > > single virtual area got dma-mapped. > > > > --->8--- > > EAL: Device (dpci.1) abstracted from VFIO > > EAL: -->Initial SHM Virtual ADDR FFFBB6400000 > > EAL: -----> DMA size 0x200000 > > EAL: Total 1 segments found. > > --->8--- > > > > I am looking into this but if there is some hint which come to your > > mind, it might help. > > > > Regards, > > Shreyansh > > > > Hi Shreyansh, > > Thanks for the feedback. > > The "heap on socket 0 was expanded by 10MB" has to do with > synchronization requests in primary/secondary processes. I can see > you're allocating LPM tables - that's most likely what these allocations > are about (it's hotplugging memory). I get that but why same message multiple times without any change in the expansion. Further, I don't have multiple process - in fact, I'm working with a single datapath thread. Anyways, I will look through the code for this. > > I think i might have an idea what is going on. I am assuming that you > are starting up your DPDK application without any -m or --socket-mem > flags, which means you are starting with empty heap. Yes, no specific --socket-mem passed as argument. > > During initialization, certain DPDK features (such as service cores, > PMD's) allocate memory. Most likely you have essentially started up with > 1 2M page, which is what you see in fslmc logs: this page gets mapped > for VFIO. Agree. > > Then, you allocate a bunch of LPM tables, which trigger more memory > allocation, and trigger memory allocation callbacks registered through > rte_mem_event_register_callback(). One of these callbacks is a VFIO > callback, which is registered in eal_vfio.c:rte_vfio_enable(). However, > since fslmc bus has its own VFIO implementation that is independent of > what happens in EAL VFIO code, what probably happens is that the fslmc > bus misses the necessary messages from the memory hotplug to map > additional resources for DMA. Makes sense > > Try adding a rte_mem_event_register_callback() somewhere in fslmc init > so that it calls necessary map function. > eal_vfio.c:vfio_mem_event_callback() should provide a good template on > how to approach creating such a callback. Let me know if this works! OK. I will give this a try and update you. > > (as a side note, how can we extend VFIO to move this stuff back into EAL > and expose it as an API?) The problem is that FSLMC VFIO driver is slightly different from generic VFIO layer in the sense that device in a VFIO container is actually another level of container. Anyways, I will have a look how much generalization is possible. Or else, I will work with the vfio_mem_event_callback() as suggested above. Thanks for suggestions. > > -- > Thanks, > Anatoly