-----Original Message----- > Date: Fri, 3 Nov 2017 10:55:40 +0800 > From: Jia He <hejia...@gmail.com> > To: Jerin Jacob <jerin.ja...@caviumnetworks.com> > Cc: "Ananyev, Konstantin" <konstantin.anan...@intel.com>, "Zhao, Bing" > <iloveth...@163.com>, Olivier MATZ <olivier.m...@6wind.com>, > "dev@dpdk.org" <dev@dpdk.org>, "jia...@hxt-semitech.com" > <jia...@hxt-semitech.com>, "jie2....@hxt-semitech.com" > <jie2....@hxt-semitech.com>, "bing.z...@hxt-semitech.com" > <bing.z...@hxt-semitech.com>, "Richardson, Bruce" > <bruce.richard...@intel.com>, jianbo....@arm.com, hemant.agra...@nxp.com > Subject: Re: [dpdk-dev] [PATCH] ring: guarantee ordering of cons/prod > loading when doing enqueue/dequeue > User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 > Thunderbird/52.4.0 > > Hi Jerin > > > On 11/2/2017 4:57 PM, Jia He Wrote: > > > > Hi, Jerin > > please see my performance test below > > On 11/2/2017 3:04 AM, Jerin Jacob Wrote: > > [...] > > > Should it be like instead? > > > > > > +#else > > > + *old_head = __atomic_load_n(&r->cons.head, __ATOMIC_ACQUIRE); > > > + const uint32_t prod_tail = __atomic_load_n(&r->prod.tail, > > > __ATOMIC_ACQUIRE); > > > It would be nice to see how much overhead it gives.ie back to back > > > __ATOMIC_ACQUIRE. > > I can NOT test ring_perf_autotest in our server because of the something > > wrong in PMU counter. > > All the return value of rte_rdtsc is 0 with and without your provided ko > > module. I am still > > investigating the reason. > > > > Hi Jerin > > As for the root cause of rte_rdtsc issue, it might be due to the pmu counter > frequency is too low > > in our arm64 server("Amberwing" from qualcom) > > [586990.057779] arch_timer_get_cntfrq()=20000000 > > Only 20MHz instead of 100M/200MHz, and CNTFRQ_EL0 is not even writable in > kernel space.
May not be true, as I guess, linux 'perf' write those register in kernel space. Another option could be write from ATF/Secure boot loader if that is the case. > > Maybe the code in ring_perf_autotest needs to be changed? Increase the "iterations" to measure @ 200MHz. > > e.g. > > printf("SC empty dequeue: %.2F\n", > (double)(sc_end-sc_start) / iterations); > printf("MC empty dequeue: %.2F\n", > (double)(mc_end-mc_start) / iterations); > > Otherwise it is always 0 if the time difference divides by iterations. > > > -- > Cheers, > Jia >