Hi~ Bob,
Thanks for your response!
So, you think it is memory usage problem rather than QPI issue?
That means if I improve the memory usage issue, may the preformance will raise 
to my expected?

BTW, Have anyone every use DPDK in NUMA and use crossing CPU like my case?
If yes, could you tell me how to solve the question?
If no, I would to know the DPDK allow user to this kinds of case in their app 
or not?
If the answer is true, I need change a way to use DPDK.

above, it has lots of questions. I hope someone can help me to answer the 
questions.

On 09/04/2013 12:19 AM, Bob Chen wrote:
QPI bandwidth is definitely large enough, but it seems that QPI is only 
responsible for the communication between separate CPU chips. What you need to 
do is actually accessing the memory on the other part, probably not even hit 
the bandwidth. The latency can be caused by a lot of facts during a NUMA 
operation.
/Bob

------------------ ??? ? ------------------
???: "Zachary";<zachary.jen at cas-well.com><mailto:zachary.jen at 
cas-well.com>;
????: 2013?9?2?(???) ??11:22
???: "dev"<dev at dpdk.org><mailto:dev at dpdk.org>;
??: " "Yannic.Chou (???) : 6808" <yannic.chou at 
cas-well.com><mailto:yannic.chou at cas-well.com>; "Alan Yu ??? : 
6632""<Alan.Yu at cas-well.com><mailto:Alan.Yu at cas-well.com>;
??: [dpdk-dev] DPDK & QPI performance issue in Romley platform.

Hi~

I have a question about DPDK & QPI performance issue in Romley  platform.
Recently, I use DPDK example, l2fwd, to test DPDK's performance in my Romley 
platform.
When I try to do the test, crossing used CPU, I find the performance 
dramatically decrease.
Is it true? Or any method can prove the phenomenon?

In my opinion, there should be no this kind of issue here due to QPI have 
enough bandwidth to deal the kinds of case.
Thus, I am so amaze in our results and can not explain it.
Could someone can help me to solve this problem.

Thank a lot!


My testing environment describe as below:

Platform:         Romley
CPU:                E5-2643 * 2
RAM:               Transcend 8GB PC3-1600 DDR3 * 8
OS:                 Fedora core 14
DPDK:            v1.3.1r2, example/l2fwd
Slot setting:
                      SlotA is controled by CPU1 directly.
                      SlotB is controled by CPU0 directly.

DPDK pre-setting:
a. BIOS setting:
    HT=disable
b. Kernel paramaters
    isolcpus=2,3,6,7
    default_hugepagesz=1024M
    hugepagesz=1024M
    hugepages=16
c. OS setting:
    service avahi-daemon stop
    service NetworkManager stop
    service iptables stop
    service acpid stop
    selinux disable


Example program Command:
a. SlotB(CPU0) -> CPU1
    #>./l2fwd -c 0xc -n 4 -- -q 1 -p 0xc

b. SlotA(CPU1) -> CPU0
    #>./l2fwd -c 0xc0 -n 4 -- -q 1 -p 0xc0

Results:
     use frame size 128 bytes
CPU Affinity

Slot A (CPU1)

Slot B (CPU0)

CPU0

15.9%

96.49%

CPU1

90.88%

24.78%



????????????????????????????????????????????? This email may contain 
confidential information. Please do not use or disclose it in any way and 
delete it if you are not the intended recipient.



--
Best Regards,
Zachary Jen

Software RD
CAS-WELL Inc.
8th Floor, No. 242, Bo-Ai St., Shu-Lin City, Taipei County 238, Taiwan
Tel: +886-2-7731-8888#6305
Fax: +886-2-7731-9988


????????????????????????????????????????????? This email may contain 
confidential information. Please do not use or disclose it in any way and 
delete it if you are not the intended recipient.

Reply via email to