benchmarking of user-level pcnet32 driver

Da Zheng Thu, 12 Nov 2009 18:57:43 -0800

Hello,

I finished porting pcnet32 driver from Mach to the user space and here is the 
brief report of performance measurement.
It was just a rough performance measurement. The Hurd runs in a VMWare virtual 
machine and it only receives data. The data sender is in the host machine, 
which runs Mac OS X. The processor of my machine is 2.4 GHz Intel Core 2 Duo. 
Of course, I am aware that the data sender shouldn't run on the same physical 
machine as the Hurd and that it's better to test it when the Hurd runs in a 
physical machine. But the purpose of the experiment is to prove that the 
user-level driver still has fairly good performance compared with the in-kernel 
driver, instead of measuring its performance accurately.


I measured its performance using TCP and UDP packets. There are 2 experiments 
and each one runs 3 times. All data shown below is average values. I compare 
its performance with the in-kernel pcnet32 driver.
Experiment 1: sending 100MB data from the host machine to the Hurd with TCP. I 
only calculate data transferring rate (i.e., excluding the size of packet 
headers).
                in-kernel                       user-level
average rate:   2.3MB/s                         2.4MB/s
peak rate:      3.9MB/s                         2.6MB/s

Experiment 2: sending 1,000,000 UDP packets of minimum size (60 bytes including 
packet headers) within 30 seconds
                        in-kernel               user-level
nb of received packets: 46,959                  68,174

As we see, both have quite bad performance in Experiment 2, but the result is 
still really unexpected. But I think it can be explain as follows:
The in-kernel driver puts received packets in a queue so the software interrupt 
can handle it and deliver it to the user space. But if the queue is full, 
received packets are simply dropped. In the case that a large number of packets 
rush to the Ethernet card in a very short time, most of CPU time is used by the 
interrupt handler to get the packet from the card but the interrupt handler 
fails to put packets in the queue for further delivery. As a result, most of 
CPU is wasted.
My user-level driver doesn't put received packets in the queue, but instead it 
calls mach_msg() to deliver messages to pfinet directly. It's true that the 
user-level interrupt handler fails to get all packets from the card and most 
packets are discarded by the card directly. But as long as a packet is received 
by the driver, it's very likely that the packet will be delivered to pfinet. 
Thus, CPU isn't much wasted (I have another implementation that the user-level 
driver puts received packets in a queue first before delivering them to pfinet 
and the implementation has extremely bad performance in the case that lots of 
packets rush in).

Anyway, the informal benchmarking shows us that the user-level pcnet32 driver 
does have relatively good performance compared with the in-kernel driver.

Best regards,
Zheng Da

benchmarking of user-level pcnet32 driver

Reply via email to