Hi Jerin

On 10/25/2017 9:26 PM, Jerin Jacob Wrote:
-----Original Message-----
Date: Tue, 24 Oct 2017 10:04:26 +0800
From: Jia He <hejia...@gmail.com>
To: Jerin Jacob <jerin.ja...@caviumnetworks.com>
Cc: "Ananyev, Konstantin" <konstantin.anan...@intel.com>, "Zhao, Bing"
  <iloveth...@163.com>, Olivier MATZ <olivier.m...@6wind.com>,
  "dev@dpdk.org" <dev@dpdk.org>, "jia...@hxt-semitech.com"
  <jia...@hxt-semitech.com>, "jie2....@hxt-semitech.com"
  <jie2....@hxt-semitech.com>, "bing.z...@hxt-semitech.com"
  <bing.z...@hxt-semitech.com>, "Richardson, Bruce"
  <bruce.richard...@intel.com>
Subject: Re: [dpdk-dev] [PATCH] ring: guarantee ordering of cons/prod
  loading when doing enqueue/dequeue
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101
  Thunderbird/52.4.0

Hi Jerin
Hi Jia,


example:
./build/app/test -c 0xff -n 4
ring_perf_autotest
Seem in our arm64 server, the ring_perf_autotest will be finished in a few
seconds:
Yes. It just need a few seconds.

Anything wrong about configuration or environment setup?
By default, arm64+dpdk will be using el0 counter to measure the cycles. I
think, in your SoC, it will be running at 50MHz or 100MHz.So, You can
follow the below scheme to get accurate cycle measurement scheme:

See: http://dpdk.org/doc/guides/prog_guide/profile_app.html
check: 44.2.2. High-resolution cycle counter
Thank you for the suggestions.
But I tried your provided ko module to enable the accurate cycle measurement in user space, the
test result is as below:

root@nfv-demo01:~/dpdk/build/build/test/test# lsmod |grep pmu
pmu_el0_cycle_counter   262144  0
[old codes, without any patches]
============================================
RTE>>ring_perf_autotest
### Testing single element and burst enq/deq ###
SP/SC single enq/dequeue: 0
MP/MC single enq/dequeue: 0
SP/SC burst enq/dequeue (size: 8): 0
MP/MC burst enq/dequeue (size: 8): 0
SP/SC burst enq/dequeue (size: 32): 0
MP/MC burst enq/dequeue (size: 32): 0

### Testing empty dequeue ###
SC empty dequeue: 0.00
MC empty dequeue: 0.00

### Testing using a single lcore ###
SP/SC bulk enq/dequeue (size: 8): 0.00
MP/MC bulk enq/dequeue (size: 8): 0.00
SP/SC bulk enq/dequeue (size: 32): 0.00
MP/MC bulk enq/dequeue (size: 32): 0.00

### Testing using two hyperthreads ###
SP/SC bulk enq/dequeue (size: 8): 0.00
MP/MC bulk enq/dequeue (size: 8): 0.00
SP/SC bulk enq/dequeue (size: 32): 0.00
MP/MC bulk enq/dequeue (size: 32): 0.00
Test OK

[with full rte_smp_rmb barrier patch]
======================================
RTE>>ring_perf_autotest
### Testing single element and burst enq/deq ###
SP/SC single enq/dequeue: 0
MP/MC single enq/dequeue: 0
SP/SC burst enq/dequeue (size: 8): 0
MP/MC burst enq/dequeue (size: 8): 0
SP/SC burst enq/dequeue (size: 32): 0
MP/MC burst enq/dequeue (size: 32): 0

### Testing empty dequeue ###
SC empty dequeue: 0.00
MC empty dequeue: 0.00

### Testing using a single lcore ###
SP/SC bulk enq/dequeue (size: 8): 0.00
MP/MC bulk enq/dequeue (size: 8): 0.00
SP/SC bulk enq/dequeue (size: 32): 0.00
MP/MC bulk enq/dequeue (size: 32): 0.00

### Testing using two hyperthreads ###
SP/SC bulk enq/dequeue (size: 8): 0.00
MP/MC bulk enq/dequeue (size: 8): 0.00
SP/SC bulk enq/dequeue (size: 32): 0.00
MP/MC bulk enq/dequeue (size: 32): 0.00
Test OK
RTE>>

No difference,all time is 0 ?

If I rmmod pmu_el0_cycle_counter and revise the ./build/.config to comment the config line
#CONFIG_RTE_ARM_EAL_RDTSC_USE_PMU=y

Then the time is bigger than 0

root@ubuntu:/home/hj/dpdk/build/build/test/test# ./test -c 0xff -n 4
EAL: Detected 44 lcore(s)
EAL: Probing VFIO support...
APP: HPET is not enabled, using TSC as default timer
RTE>>per_lcore_autotest
RTE>>ring_perf_autotest
### Testing single element and burst enq/deq ###
SP/SC single enq/dequeue: 0
MP/MC single enq/dequeue: 2
SP/SC burst enq/dequeue (size: 8): 0
If you follow the above link, The value '0' will be replaced with more meaning 
full data.

MP/MC burst enq/dequeue (size: 8): 0
SP/SC burst enq/dequeue (size: 32): 0
MP/MC burst enq/dequeue (size: 32): 0

### Testing empty dequeue ###
SC empty dequeue: 0.02
MC empty dequeue: 0.04

### Testing using a single lcore ###
SP/SC bulk enq/dequeue (size: 8): 0.12
MP/MC bulk enq/dequeue (size: 8): 0.31
SP/SC bulk enq/dequeue (size: 32): 0.05
MP/MC bulk enq/dequeue (size: 32): 0.09

### Testing using two hyperthreads ###
SP/SC bulk enq/dequeue (size: 8): 0.12
MP/MC bulk enq/dequeue (size: 8): 0.39
SP/SC bulk enq/dequeue (size: 32): 0.04
MP/MC bulk enq/dequeue (size: 32): 0.12

### Testing using two physical cores ###
SP/SC bulk enq/dequeue (size: 8): 0.37
MP/MC bulk enq/dequeue (size: 8): 0.92
SP/SC bulk enq/dequeue (size: 32): 0.12
MP/MC bulk enq/dequeue (size: 32): 0.26
Test OK
RTE>>

Cheers,
Jia
By default, arm64+dpdk will be using el0 counter to measure the cycles. I
think, in your SoC, it will be running at 50MHz or 100MHz.So, You can
follow the below scheme to get accurate cycle measurement scheme:

See: http://dpdk.org/doc/guides/prog_guide/profile_app.html
check: 44.2.2. High-resolution cycle counter

--
Cheers,
Jia

Reply via email to