> > Hi, > > On 2017/10/19 18:02, Ananyev, Konstantin wrote: > > > > Hi Jia, > > > >> > >> Hi > >> > >> > >> On 10/13/2017 9:02 AM, Jia He Wrote: > >>> Hi Jerin > >>> > >>> > >>> On 10/13/2017 1:23 AM, Jerin Jacob Wrote: > >>>> -----Original Message----- > >>>>> Date: Thu, 12 Oct 2017 17:05:50 +0000 > >>>>> > >> [...] > >>>> On the same lines, > >>>> > >>>> Jia He, jie2.liu, bing.zhao, > >>>> > >>>> Is this patch based on code review or do you saw this issue on any of > >>>> the > >>>> arm/ppc target? arm64 will have performance impact with this change. > >> sorry, miss one important information > >> Our platform is an aarch64 server with 46 cpus. > >> If we reduced the involved cpu numbers, the bug occurred less frequently. > >> > >> Yes, mb barrier impact the performance, but correctness is more > >> important, isn't it ;-) > >> Maybe we canĀ find any other lightweight barrier here? > >> > >> Cheers, > >> Jia > >>> Based on mbuf_autotest, the rte_panic will be invoked in seconds. > >>> > >>> PANIC in test_refcnt_iter(): > >>> (lcore=0, iter=0): after 10s only 61 of 64 mbufs left free > >>> 1: [./test(rte_dump_stack+0x38) [0x58d868]] > >>> Aborted (core dumped) > >>> > > > > So is it only reproducible with mbuf refcnt test? > > Could it be reproduced with some 'pure' ring test > > (no mempools/mbufs refcnt, etc.)? > > The reason I am asking - in that test we also have mbuf refcnt updates > > (that's what for that test was created) and we are doing some optimizations > > here too > > to avoid excessive atomic updates. > > BTW, if the problem is not reproducible without mbuf refcnt, > > can I suggest to extend the test with: > > - add a check that enqueue() operation was successful > > - walk through the pool and check/printf refcnt of each mbuf. > > Hopefully that would give us some extra information what is going wrong > > here. > > Konstantin > > > > > Currently, the issue is only found in this case here on the ARM > platform, not sure how it is going with the X86_64 platform
I understand that it is only reproducible on arm so far. What I am asking - with dpdk is there any other way to reproduce it (on arm) except then running mbuf_autotest? Something really simple that not using mbuf/mempool etc? Just do dequeue/enqueue from multiple threads and check data integrity at the end? If not - what makes you think that the problem is precisely in rte_ring code? Why not in rte_mbuf let say? >. In another > mail of this thread, we've made a simple test based on this and captured > some information and I pasted there.(I pasted the patch there :-)) Are you talking about that one: http://dpdk.org/dev/patchwork/patch/30405/ ? It still uses test/test/test_mbuf.c..., but anyway I don't really understand how mbuf_autotest supposed to work with these changes: @@ -730,7 +739,7 @@ test_refcnt_iter(unsigned int lcore, unsigned int iter, rte_ring_enqueue(refcnt_mbuf_ring, m); } } - rte_pktmbuf_free(m); + // rte_pktmbuf_free(m); } @@ -741,6 +750,12 @@ test_refcnt_iter(unsigned int lcore, unsigned int iter, while (!rte_ring_empty(refcnt_mbuf_ring)) ; + if (NULL != m) { + if (1 != rte_mbuf_refcnt_read(m)) + printf("m ref is %u\n", rte_mbuf_refcnt_read(m)); + rte_pktmbuf_free(m); + } + /* check that all mbufs are back into mempool by now */ for (wn = 0; wn != REFCNT_MAX_TIMEOUT; wn++) { if ((i = rte_mempool_avail_count(refcnt_pool)) == n) { That means all your mbufs (except the last one) will still be allocated. So the test would fail - as it should, I think. > And > it seems that Juhamatti & Jacod found some reverting action several > months ago. Didn't get that one either. Konstantin