Hi Prashant, May be you could monitor RAM, QPI and PCIe activity with http://software.intel.com/en-us/articles/intel-performance-counter-monitor-a -better-way-to-measure-cpu-utilization
It may ease investigating the issue. Fran?ois-Fr?d?ric > -----Message d'origine----- > De?: dev [mailto:dev-bounces at dpdk.org] De la part de Prashant Upadhyaya > Envoy??: mercredi 12 f?vrier 2014 13:03 > ??: Etai Lev Ran > Cc?: dev at dpdk.org > Objet?: Re: [dpdk-dev] NUMA CPU Sockets and DPDK > > Hi Etai, > > Ofcourse all DPDK threads consume 100 % (unless some waits are introduced > for some power saving etc., all typical DPDK threads are while(1) loops) > When I said core 1 is unusually busy, I meant to say that it is not able to > read beyond 2 Gbps or so and the packets are dropping at NIC. > (I have my own custom way of calculating the cpu utilization of core 1 > based on how many empty polls were done and how many polls got me data > which I then process) On the 8 core machine with single socket, the core 1 > was being able to lift successfully much higher data rates, hence the > question. > > Regards > -Prashant > > > -----Original Message----- > From: Etai Lev Ran [mailto:elevran at gmail.com] > Sent: Wednesday, February 12, 2014 5:18 PM > To: Prashant Upadhyaya > Cc: dev at dpdk.org > Subject: RE: [dpdk-dev] NUMA CPU Sockets and DPDK > > Hi Prashant, > > Based on our experience, using DPDK cross CPU sockets may indeed result in > some performance degradation (~10% for our application vs. staying in > socket. YMMV based on HW, application structure, etc.). > > Regarding CPU utilization on core 1, the one picking up traffic: perhaps I > had misunderstood your comment, but I would expect it to always be close to > 100% since it's polling the device via the PMD and not driven by > interrupts. > > Regards, > Etai > > -----Original Message----- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Prashant Upadhyaya > Sent: Wednesday, February 12, 2014 1:28 PM > To: dev at dpdk.org > Subject: [dpdk-dev] NUMA CPU Sockets and DPDK > > Hi guys, > > What has been your experience of using DPDK based app's in NUMA mode with > multiple sockets where some cores are present on one socket and other cores > on some other socket. > > I am migrating my application from one intel machine with 8 cores, all in > one socket to a 32 core machine where 16 cores are in one socket and 16 > other cores in the second socket. > My core 0 does all initialization for mbuf's, nic ports, queues etc. and > uses SOCKET_ID_ANY for socket related parameters. > > The usecase works, but I think I am running into performance issues on the > 32 core machine. > The lscpu output on my 32 core machine shows the following - NUMA node0 > CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30 > NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31 > I am using core 1 to lift all the data from a single queue of an 82599EB > port and I see that the cpu utilization for this core 1 is way too high > even for lifting traffic of 1 Gbps with packet size of 650 bytes. > > In general, does one need to be careful in working with multiple sockets > and so forth, any comments would be helpful. > > Regards > -Prashant > > > > > > =========================================================================== > = > === > Please refer to http://www.aricent.com/legal/email_disclaimer.html > for important disclosures regarding this electronic communication. > =========================================================================== > = > === > > > > > > =========================================================================== > ==== > Please refer to http://www.aricent.com/legal/email_disclaimer.html > for important disclosures regarding this electronic communication. > =========================================================================== > ====