Hello, I finished porting pcnet32 driver from Mach to the user space and here is the brief report of performance measurement. It was just a rough performance measurement. The Hurd runs in a VMWare virtual machine and it only receives data. The data sender is in the host machine, which runs Mac OS X. The processor of my machine is 2.4 GHz Intel Core 2 Duo. Of course, I am aware that the data sender shouldn't run on the same physical machine as the Hurd and that it's better to test it when the Hurd runs in a physical machine. But the purpose of the experiment is to prove that the user-level driver still has fairly good performance compared with the in-kernel driver, instead of measuring its performance accurately.
I measured its performance using TCP and UDP packets. There are 2 experiments and each one runs 3 times. All data shown below is average values. I compare its performance with the in-kernel pcnet32 driver. Experiment 1: sending 100MB data from the host machine to the Hurd with TCP. I only calculate data transferring rate (i.e., excluding the size of packet headers). in-kernel user-level average rate: 2.3MB/s 2.4MB/s peak rate: 3.9MB/s 2.6MB/s Experiment 2: sending 1,000,000 UDP packets of minimum size (60 bytes including packet headers) within 30 seconds in-kernel user-level nb of received packets: 46,959 68,174 As we see, both have quite bad performance in Experiment 2, but the result is still really unexpected. But I think it can be explain as follows: The in-kernel driver puts received packets in a queue so the software interrupt can handle it and deliver it to the user space. But if the queue is full, received packets are simply dropped. In the case that a large number of packets rush to the Ethernet card in a very short time, most of CPU time is used by the interrupt handler to get the packet from the card but the interrupt handler fails to put packets in the queue for further delivery. As a result, most of CPU is wasted. My user-level driver doesn't put received packets in the queue, but instead it calls mach_msg() to deliver messages to pfinet directly. It's true that the user-level interrupt handler fails to get all packets from the card and most packets are discarded by the card directly. But as long as a packet is received by the driver, it's very likely that the packet will be delivered to pfinet. Thus, CPU isn't much wasted (I have another implementation that the user-level driver puts received packets in a queue first before delivering them to pfinet and the implementation has extremely bad performance in the case that lots of packets rush in). Anyway, the informal benchmarking shows us that the user-level pcnet32 driver does have relatively good performance compared with the in-kernel driver. Best regards, Zheng Da