On Fri, Mar 31, 2023 at 03:11:18PM +0530, Prashant Upadhyaya wrote: > On Thu, Mar 30, 2023 at 7:34 PM Bruce Richardson > <bruce.richard...@intel.com> wrote: > > > > On Thu, Mar 30, 2023 at 07:07:23PM +0530, Prashant Upadhyaya wrote: > > > On Thu, Mar 30, 2023 at 6:47 PM Bruce Richardson > > > <bruce.richard...@intel.com> wrote: > > > > > > > > On Thu, Mar 30, 2023 at 06:42:58PM +0530, Prashant Upadhyaya wrote: > > > > > On Thu, Mar 30, 2023 at 2:50 PM Bruce Richardson > > > > > <bruce.richard...@intel.com> wrote: > > > > > > > > > > > > On Thu, Mar 30, 2023 at 01:57:52PM +0530, Prashant Upadhyaya wrote: > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > FYI, when replying on list, it's best not to top-post, but put your > > > > > > replies > > > > > > below the email snippet you are replying to. > > > > > > > > > > > > > The hash creation API throws the following error -- > > > > > > > RING: Cannot reserve memory for tailq > > > > > > > HASH: memory allocation failed > > > > > > > > > > > > > > The timer subsystem init api throws this error -- > > > > > > > EAL: memzone_reserve_aligned_thread_unsafe(): Number of requested > > > > > > > memzone segments exceeds RTE_MAX_MEMZONE > > > > > > > > > > > > > > > > > > > Can you try increasing RTE_MAX_MEMZONE. It' defined in DPDK's > > > > > > rte_config.h > > > > > > file, so edit that and then rebuild DPDK. [If you are using the > > > > > > built-in > > > > > > DPDK from VPP, you may need to do a patch for this, add it into the > > > > > > VPP > > > > > > patches direction and then do a VPP rebuild.] > > > > > > > > > > > > Let's see if we can get rid of at least one of the error messages. > > > > > > :-) > > > > > > > > > > > > /Bruce > > > > > > > > > > > > > I did check the code and apparently the memzone and rte zmalloc > > > > > > > related api's are not being able to allocate memory. > > > > > > > > > > > > > > Regards > > > > > > > -Prashant > > > > > > > > > > > > > > On Thu, Mar 30, 2023 at 1:30 PM Bruce Richardson > > > > > > > <bruce.richard...@intel.com> wrote: > > > > > > > > > > > > > > > > On Thu, Mar 30, 2023 at 10:30:24AM +0530, Prashant Upadhyaya > > > > > > > > wrote: > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > While trying to port some code to VPP (which uses DPDK as the > > > > > > > > > backend > > > > > > > > > driver), I am running into a problem that calls to API's like > > > > > > > > > rte_timer_subsystem_init, rte_hash_create are failing while > > > > > > > > > allocation > > > > > > > > > of memory. > > > > > > > > > > > > > > > > > > This is presumably because VPP inits the EAL with the > > > > > > > > > following arguments -- > > > > > > > > > > > > > > > > > > -in-memory --no-telemetry --file-prefix vpp > > > > > > > > > > > > > > > > > > Is there is something that can be done eg. passing some more > > > > > > > > > parms in > > > > > > > > > the EAL initialization which hopefully wouldn't break VPP but > > > > > > > > > will > > > > > > > > > also be friendly to the RTE timer and hash functions too, > > > > > > > > > that would > > > > > > > > > be great, so requesting some advice here. > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > can you provide some more details on what the errors are that > > > > > > > > you are > > > > > > > > receiving? Have you been able to dig a little deeper into what > > > > > > > > might be > > > > > > > > causing the memory failures? The above flags alone are unlikely > > > > > > > > to cause > > > > > > > > issues with hash or timer libraries, for example. > > > > > > > > > > > > > > > > /Bruce > > > > > > > > > > Thanks Bruce, the error comes from the following function in > > > > > lib/eal/common/eal_common_memzone.c > > > > > memzone_reserve_aligned_thread_unsafe > > > > > > > > > > The condition which spits out the error is the following > > > > > if (arr->count >= arr->len) > > > > > So I printed both of the above values inside this function, and the > > > > > following output came > > > > > > > > > > vpp[14728]: dpdk: EAL init args: --in-memory --no-telemetry > > > > > --file-prefix vpp > > > > > [New Thread 0x7fffa67b6700 (LWP 14732)] > > > > > count: 0 len: 2560 > > > > > count: 1 len: 2560 > > > > > count: 2 len: 2560 > > > > > [New Thread 0x7fffa5fb5700 (LWP 14733)] > > > > > [New Thread 0x7fffa5db4700 (LWP 14734)] > > > > > count: 3 len: 2560 > > > > > count: 4 len: 2560 > > > > > ### this is the place where I call rte_timer_subsystem_init from my > > > > > code, the above must be coming from any other code from VPP/EAL init, > > > > > the line below is surely because of my call to > > > > > rte_timer_subsystem_init > > > > > count: 0 len: 0 > > > > > > > > > > So as you can see that both values are coming to be zero -- is this > > > > > expected ? I thought the arr->len should have been non zero. > > > > > I must add that the thread which is calling the > > > > > rte_timer_subsystem_init is possibly different than the one which did > > > > > the eal init, do you think that might be a problem... > > > > > I am yet to increase the value of RTE_MAX_MEMZONE, but wanted to share > > > > > the above first for any suggestions. > > > > > > > > > Given the lengths you printed above, increasing the MAX_MEMZONE will not > > > > help things. Is the init call which is failing coming from a non-DPDK > > > > thread? > > > > > > Likely yes, at the moment I am calling it from a CLI which I have added > > > in VPP. > > > Assuming this is the case, do you foresee a problem ? > > > > Could well be a possible cause, yes. With non-DPDK threads, the memory NUMA > > node/socket-id entries could be invalid, and cause the DPDK memory > > allocation to look for memory heaps on non-existent NUMA nodes. > > Can you try using rte_thread_register API in your thread before calling the > > init functions and see if that helps. > > > > /Bruce > > Still no luck ! > I tried two things -- > First, I tried to make the calls from the VPP's fastpath thread (which > I hoped would be a true DPDK thread internally), but the calls failed > like before. > Second, I tried to do a rte_thread_register on this fastpath thread > before making the calls -- this did not help either, same problem. > It appears that VPP's memory management has done something so that > these rte calls are not able to access the expected datastructures at > DPDK level. > It appears I am the only guy in the world trying to make these rte > calls from VPP plugins, I checked on VPP mailing list too and the only > suggestion I got was to replace DPDK memzone/memory allocator > functions with those of VPP. This becomes intricate work as I call rte > timers, hash and rcu functions in my code. > Can I patch DPDK in a generic fashion to reserve memory from a VPP > allocator function or even a malloc at some centralized place or a set > of functions in DPDK ?
If doing any such patching, the patches should add new init functions to DPDK to allow init with already-allocated memory, or with a custom memory allocator to allow reserving X amount of memory. That's the only way I can see to do a generic version here. > But the larger question is what are we running into here which is > causing these issues. > It's a very good question. I don't know enough about VPP to answer that - but I suggest you ask on the VPP mailing list, as its likely others in the VPP community may be better able to help. I would suggest doing this before looking into any patching of DPDK, there may be easier workarounds if we know the exact root cause of the issue. Regards, /Bruce