jamal wrote:
On Thu, 2007-06-09 at 15:16 +0100, James Chapman wrote:

>> First, do we need to encourage consistency in NAPI poll drivers?

not to stiffle the discussion, but Stephen Hemminger is planning to
write a new howto; that would be a good time to bring up the topic. The challenge is that there may be hardware issues that will result in small
deviations.

Ok.

When a device is in polled mode while idle, there are 2 scheduling cases to 
consider:-

1. One or more other netdevs is not idle and is consuming quota on each poll. The net_rx softirq will loop until the next jiffy tick or when quota is exceeded, calling each device in its polled list. Since the idle device is still in the poll list, it will be polled very rapidly.

One suggestion on limiting the amount of polls is to actually have the
driver chew something off the quota even on empty polls - easier by just
changing the driver. A simple case will be say 1 packet (more may make
more sense, machine dependent) every time poll is invoked by the core.

I wanted to minimize the impact on devices that do have work to do. But it's worth investigating. Thanks for the suggestion.

In testing, I see significant reduction in interrupt rate for typical traffic patterns. A flood ping, for example, keeps the device in polled mode, generating no interrupts.

Must be a fast machine.

Not really. I used 3-year-old, single CPU x86 boxes with e100 interfaces. The idle poll change keeps them in polled mode. Without idle poll, I get twice as many interrupts as packets, one for txdone and one for rx. NAPI is continuously scheduled in/out.

In a test, 8510 packets are sent/received versus 6200 previously;

The other packets are dropped?

No. Since I did a flood ping from the machine under test, the improved latency meant that the ping response was handled more quickly, causing the next packet to be sent sooner. So more packets were transmitted in the allotted time (10 seconds).

What are the rtt numbers like?

With current NAPI:
rtt min/avg/max/mdev = 0.902/1.843/101.727/4.659 ms, pipe 9, ipg/ewma 1.611/1.421 ms

With idle poll changes:
rtt min/avg/max/mdev = 0.898/1.117/28.371/0.689 ms, pipe 3, ipg/ewma 1.175/1.236 ms

CPU load is 100% versus 62% previously;

not good.

But the CPU has done more work. The flood ping will always show increased CPU with these changes because the driver always stays in the NAPI poll list. For typical LAN traffic, the average CPU usage doesn't increase as much, though more measurements would be useful.

Your results above showed decreased tput and increased cpu - did you
mistype that?

I didn't use clear English. :) I'm seeing increased throughput, mostly because latency is improved. The increased cpu is partly because of the increased throughput, and partly because ksoftirqd stays busy longer.

despite the CPU load being increased. For a system whose main job is processing network traffic quickly, like an embedded router or a network server, this approach might be very beneficial.

I am not sure i buy that James;-> The router types really have not much
of a challenge in this area.

The problem I started thinking about was the one where NAPI thrashes in/out of polled mode at higher and higher rates as network interface speeds and CPU speeds increase. A flood ping demonstrates this even on 100M links on my boxes. Networking boxes want consistent performance/latency for all traffic patterns and they need to avoid interrupt livelock. Current practice seems to be to use hardware interrupt mitigation or timers to limit interrupt rate but this just hurts latency, as you noted. So I'm trying to find a way to limit the NAPI interrupt rate without increasing latency. My comment about this approach being suitable for routers and networked servers is that these boxes care more about minimizing packet latency than they do about wasting CPU cycles by polling idle devices.

You are doing the right thing by following the path on perfomance
analysis. I hope you dont get discouraged because the return on
investment may be very low in such work - the majority of the work is in
the testing and analysis (not in puking code endlessly).

Thanks for your feedback. The challenge will be finding the time to do this work. :)

--
James Chapman
Katalix Systems Ltd
http://www.katalix.com
Catalysts for your Embedded Linux software development

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to