-----Original Message----- > Date: Fri, 20 Oct 2017 09:57:58 +0800 > From: Jia He <hejia...@gmail.com> > To: "Ananyev, Konstantin" <konstantin.anan...@intel.com>, "Zhao, Bing" > <iloveth...@163.com>, Jerin Jacob <jerin.ja...@caviumnetworks.com> > Cc: Olivier MATZ <olivier.m...@6wind.com>, "dev@dpdk.org" <dev@dpdk.org>, > "jia...@hxt-semitech.com" <jia...@hxt-semitech.com>, > "jie2....@hxt-semitech.com" <jie2....@hxt-semitech.com>, > "bing.z...@hxt-semitech.com" <bing.z...@hxt-semitech.com>, "Richardson, > Bruce" <bruce.richard...@intel.com> > Subject: Re: [dpdk-dev] [PATCH] ring: guarantee ordering of cons/prod > loading when doing enqueue/dequeue > User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 > Thunderbird/52.4.0 > > > > On 10/20/2017 4:02 AM, Ananyev, Konstantin Wrote: > > > > > -----Original Message----- > > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Ananyev, Konstantin > > > Sent: Thursday, October 19, 2017 3:15 PM > > > To: Zhao, Bing <iloveth...@163.com>; Jia He <hejia...@gmail.com>; Jerin > > > Jacob <jerin.ja...@caviumnetworks.com> > > > Cc: Olivier MATZ <olivier.m...@6wind.com>; dev@dpdk.org; > > > jia...@hxt-semitech.com; jie2....@hxt-semitech.com; bing.zhao@hxt- > > > semitech.com > > > Subject: Re: [dpdk-dev] [PATCH] ring: guarantee ordering of cons/prod > > > loading when doing enqueue/dequeue > > > > > > > Hi, > > > > > > > > On 2017/10/19 18:02, Ananyev, Konstantin wrote: > > > > > Hi Jia, > > > > > > > > > > > Hi > > > > > > > > > > > > > > > > > > On 10/13/2017 9:02 AM, Jia He Wrote: > > > > > > > Hi Jerin > > > > > > > > > > > > > > > > > > > > > On 10/13/2017 1:23 AM, Jerin Jacob Wrote: > > > > > > > > -----Original Message----- > > > > > > > > > Date: Thu, 12 Oct 2017 17:05:50 +0000 > > > > > > > > > > > > > > > [...] > > > > > > > > On the same lines, > > > > > > > > > > > > > > > > Jia He, jie2.liu, bing.zhao, > > > > > > > > > > > > > > > > Is this patch based on code review or do you saw this issue on > > > > > > > > any of > > > > > > > > the > > > > > > > > arm/ppc target? arm64 will have performance impact with this > > > > > > > > change. > > > > > > sorry, miss one important information > > > > > > Our platform is an aarch64 server with 46 cpus. > > > > > > If we reduced the involved cpu numbers, the bug occurred less > > > > > > frequently. > > > > > > > > > > > > Yes, mb barrier impact the performance, but correctness is more > > > > > > important, isn't it ;-) > > > > > > Maybe we can find any other lightweight barrier here? > > > > > > > > > > > > Cheers, > > > > > > Jia > > > > > > > Based on mbuf_autotest, the rte_panic will be invoked in seconds. > > > > > > > > > > > > > > PANIC in test_refcnt_iter(): > > > > > > > (lcore=0, iter=0): after 10s only 61 of 64 mbufs left free > > > > > > > 1: [./test(rte_dump_stack+0x38) [0x58d868]] > > > > > > > Aborted (core dumped) > > > > > > > > > > > > So is it only reproducible with mbuf refcnt test? > > > > > Could it be reproduced with some 'pure' ring test > > > > > (no mempools/mbufs refcnt, etc.)? > > > > > The reason I am asking - in that test we also have mbuf refcnt updates > > > > > (that's what for that test was created) and we are doing some > > > > > optimizations here too > > > > > to avoid excessive atomic updates. > > > > > BTW, if the problem is not reproducible without mbuf refcnt, > > > > > can I suggest to extend the test with: > > > > > - add a check that enqueue() operation was successful > > > > > - walk through the pool and check/printf refcnt of each mbuf. > > > > > Hopefully that would give us some extra information what is going > > > > > wrong here. > > > > > Konstantin > > > > > > > > > > > > > > Currently, the issue is only found in this case here on the ARM > > > > platform, not sure how it is going with the X86_64 platform > > > I understand that it is only reproducible on arm so far. > > > What I am asking - with dpdk is there any other way to reproduce it (on > > > arm) > > > except then running mbuf_autotest? > > > Something really simple that not using mbuf/mempool etc? > > > Just do dequeue/enqueue from multiple threads and check data integrity at > > > the end? > > > If not - what makes you think that the problem is precisely in rte_ring > > > code? > > > Why not in rte_mbuf let say? > > Actually I reread your original mail and finally get your point. > > If I understand you correctly the problem with read reordering here is that > > after we read prot.tail but before we read cons.head > > both cons.head and prod.tail might be updated, > > but for us prod.tail change might be unnoticed. > > As an example: > > time 0 (cons.head == 0, prod.tail == 0): > > prod_tail = r->prod.tail; /* due read reordering */ > > /* prod_tail == 0 */ > > > > time 1 (cons.head ==5, prod.tail == 5): > > *old_head = r->cons.head; > > /* cons.head == 5 */ > > *entries = (prod_tail - *old_head); > > /* *entries == (0 - 5) == 0xFFFFFFFB */ > > > > And we will move our cons.head forward, even though there are no filled > > entries in the ring. > > Is that right? > Yes > > As I side notice, I wonder why we have here: > > *entries = (prod_tail - *old_head); > > instead of: > > *entries = r->size + prod_tail - *old_head; > > ? > Yes, I agree with you at this code line. > But reordering will still mess up things even after this change(I have > tested, still the same as before) > I thought the *entries is a door to prevent consumer from moving forward too > fast than the producer. > But in some cases, it is correct that prod_tail is smaller than *old_head > due to the cirular queue. > In other cases, it is incorrect because of read/read reordering. > > AFAICT, the root cause here is the producer tail and cosumer head are > dependant on each other. > Thus a memory barrier is neccessary.
Yes. The barrier is necessary. In fact, upstream freebsd fixed this issue for arm64. DPDK ring implementation is derived from freebsd's buf_ring.h. https://github.com/freebsd/freebsd/blob/master/sys/sys/buf_ring.h#L166 I think, the only outstanding issue is, how to reduce the performance impact for arm64. I believe using accurate/release semantics instead of rte_smp_rmb() will reduce the performance overhead like similar ring implementations below, freebsd: https://github.com/freebsd/freebsd/blob/master/sys/sys/buf_ring.h#L166 odp: https://github.com/Linaro/odp/blob/master/platform/linux-generic/pktio/ring.c Jia, 1) Can you verify the use of accurate/release semantics fixes the problem in your platform? like use of atomic_load_acq* in the reference code. 2) If so, What is the overhead between accurate/release and plane smp_smb() barriers. Based on that we need decide what path to take. Note: This issue wont come in all the arm64 implementation. it comes on arm64 implementation with OOO(out of order) implementations. > > Cheers, > Jia > > > > > Konstantin > > > > > > . In another > > > > mail of this thread, we've made a simple test based on this and captured > > > > some information and I pasted there.(I pasted the patch there :-)) > > > Are you talking about that one: > > > http://dpdk.org/dev/patchwork/patch/30405/ > > > ? > > > It still uses test/test/test_mbuf.c..., > > > but anyway I don't really understand how mbuf_autotest supposed > > > to work with these changes: > > > @@ -730,7 +739,7 @@ test_refcnt_iter(unsigned int lcore, unsigned int > > > iter, > > > rte_ring_enqueue(refcnt_mbuf_ring, m); > > > } > > > } > > > - rte_pktmbuf_free(m); > > > + // rte_pktmbuf_free(m); > > > } > > > @@ -741,6 +750,12 @@ test_refcnt_iter(unsigned int lcore, unsigned int > > > iter, > > > while (!rte_ring_empty(refcnt_mbuf_ring)) > > > ; > > > > > > + if (NULL != m) { > > > + if (1 != rte_mbuf_refcnt_read(m)) > > > + printf("m ref is %u\n", rte_mbuf_refcnt_read(m)); > > > + rte_pktmbuf_free(m); > > > + } > > > + > > > /* check that all mbufs are back into mempool by now */ > > > for (wn = 0; wn != REFCNT_MAX_TIMEOUT; wn++) { > > > if ((i = rte_mempool_avail_count(refcnt_pool)) == n) { > > > > > > That means all your mbufs (except the last one) will still be allocated. > > > So the test would fail - as it should, I think. > > > > > > > And > > > > it seems that Juhamatti & Jacod found some reverting action several > > > > months ago. > > > Didn't get that one either. > > > Konstantin >