On Thu, Mar 30, 2023 at 07:07:23PM +0530, Prashant Upadhyaya wrote:
> On Thu, Mar 30, 2023 at 6:47 PM Bruce Richardson
> <bruce.richard...@intel.com> wrote:
> >
> > On Thu, Mar 30, 2023 at 06:42:58PM +0530, Prashant Upadhyaya wrote:
> > > On Thu, Mar 30, 2023 at 2:50 PM Bruce Richardson
> > > <bruce.richard...@intel.com> wrote:
> > > >
> > > > On Thu, Mar 30, 2023 at 01:57:52PM +0530, Prashant Upadhyaya wrote:
> > > > > Hi,
> > > > >
> > > >
> > > > FYI, when replying on list, it's best not to top-post, but put your 
> > > > replies
> > > > below the email snippet you are replying to.
> > > >
> > > > > The hash creation API throws the following error --
> > > > > RING: Cannot reserve memory for tailq
> > > > > HASH: memory allocation failed
> > > > >
> > > > > The timer subsystem init api throws this error --
> > > > > EAL: memzone_reserve_aligned_thread_unsafe(): Number of requested
> > > > > memzone segments exceeds RTE_MAX_MEMZONE
> > > > >
> > > >
> > > > Can you try increasing RTE_MAX_MEMZONE. It' defined in DPDK's 
> > > > rte_config.h
> > > > file, so edit that and then rebuild DPDK. [If you are using the built-in
> > > > DPDK from VPP, you may need to do a patch for this, add it into the VPP
> > > > patches direction and then do a VPP rebuild.]
> > > >
> > > > Let's see if we can get rid of at least one of the error messages. :-)
> > > >
> > > > /Bruce
> > > >
> > > > > I did check the code and apparently the memzone and rte zmalloc
> > > > > related api's are not being able to allocate memory.
> > > > >
> > > > > Regards
> > > > > -Prashant
> > > > >
> > > > > On Thu, Mar 30, 2023 at 1:30 PM Bruce Richardson
> > > > > <bruce.richard...@intel.com> wrote:
> > > > > >
> > > > > > On Thu, Mar 30, 2023 at 10:30:24AM +0530, Prashant Upadhyaya wrote:
> > > > > > > Hi,
> > > > > > >
> > > > > > > While trying to port some code to VPP (which uses DPDK as the 
> > > > > > > backend
> > > > > > > driver), I am running into a problem that calls to API's like
> > > > > > > rte_timer_subsystem_init, rte_hash_create are failing while 
> > > > > > > allocation
> > > > > > > of memory.
> > > > > > >
> > > > > > > This is presumably because VPP inits the EAL with the following 
> > > > > > > arguments --
> > > > > > >
> > > > > > > -in-memory --no-telemetry --file-prefix vpp
> > > > > > >
> > > > > > > Is  there is something that can be done eg. passing some more 
> > > > > > > parms in
> > > > > > > the EAL initialization which hopefully wouldn't break VPP but will
> > > > > > > also be friendly to the RTE timer and hash functions too, that 
> > > > > > > would
> > > > > > > be great, so requesting some advice here.
> > > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > can you provide some more details on what the errors are that you 
> > > > > > are
> > > > > > receiving? Have you been able to dig a little deeper into what 
> > > > > > might be
> > > > > > causing the memory failures? The above flags alone are unlikely to 
> > > > > > cause
> > > > > > issues with hash or timer libraries, for example.
> > > > > >
> > > > > > /Bruce
> > >
> > > Thanks Bruce, the error comes from the following function in
> > > lib/eal/common/eal_common_memzone.c
> > > memzone_reserve_aligned_thread_unsafe
> > >
> > > The condition which spits out the error is the following
> > > if (arr->count >= arr->len)
> > > So I printed both of the above values inside this function, and the
> > > following output came
> > >
> > > vpp[14728]: dpdk: EAL init args: --in-memory --no-telemetry --file-prefix 
> > > vpp
> > > [New Thread 0x7fffa67b6700 (LWP 14732)]
> > > count: 0 len: 2560
> > > count: 1 len: 2560
> > > count: 2 len: 2560
> > > [New Thread 0x7fffa5fb5700 (LWP 14733)]
> > > [New Thread 0x7fffa5db4700 (LWP 14734)]
> > > count: 3 len: 2560
> > > count: 4 len: 2560
> > > ### this is the place where I call rte_timer_subsystem_init from my
> > > code, the above must be coming from any other code from VPP/EAL init,
> > > the line below is surely because of my call to
> > > rte_timer_subsystem_init
> > > count: 0 len: 0
> > >
> > > So as you can see that both values are coming to be zero -- is this
> > > expected ? I thought the arr->len should have been non zero.
> > > I must add that the thread which is calling the
> > > rte_timer_subsystem_init is possibly different than the one which did
> > > the eal init, do you think that might be a problem...
> > > I am yet to increase the value of RTE_MAX_MEMZONE, but wanted to share
> > > the above first for any suggestions.
> > >
> > Given the lengths you printed above, increasing the MAX_MEMZONE will not
> > help things. Is the init call which is failing coming from a non-DPDK
> > thread?
> 
> Likely yes, at the moment I am calling it from a CLI which I have added in 
> VPP.
> Assuming this is the case, do you foresee a problem ?

Could well be a possible cause, yes. With non-DPDK threads, the memory NUMA
node/socket-id entries could be invalid, and cause the DPDK memory
allocation to look for memory heaps on non-existent NUMA nodes.
Can you try using rte_thread_register API in your thread before calling the
init functions and see if that helps.

/Bruce

Reply via email to