On Thu, Dec 29, 2005 at 01:34:43AM +0100, Aritz Bastida wrote:

> Hello again,

Hi,


> Now I send you the statistics I have collected in the test I've done.
> But before, another problem I didn't tell before, because it didn't
> happen always. But it's quite strange. I have used pktgen in quite a
> few machines and in all of them, if you say clone_skb=1000 or so, the
> performance boosts. With the Pentium 4, however, it doesnt do
> anything. I mean:
> 
> clone_skb = 0            --> aprox. 400kpps
> clone_skb = 100000  --> aprox. 400kpps

That probably means you have enough CPU power, but the transaction
latency on the PCI bus is too high.  F.e. your NIC can't pull data over
the bus quick enough, but the CPU spends most of its time doing nothing,
so it doesn't matter whether it has to clone a few packets in that idle
time or not.


> In the Pentium 3, on the other hand, I can see the performance boost
> in all its essence (from 100kpps with clone_skb=0 to 400kpps with
> clone_skb=100000).

CPU time is probably more critical in this case.  You can get a feel
for how the CPU/IO balance is by trying a few intermediate values for
clone_skb.


> Why are these results? From what you have already
> told me, I guess the bottleneck in the Pentium 4 is the PCI bus
> (33MHz), which cant send faster. As it's seen, the machine has enough
> time to alloc new skb's before the packets are sent. So the unique
> perceptible difference between the two would be the idle time. Am I
> right?

Sounds about right, but there's a number of other tools you can use
(for example, oprofile) to find out!


> I hope these stats (combined with the information I provided in the
> previous email) will let you understand if the receiving machine has
> got HW_FLOW on or off.

Hard to be sure.  There are rx_discards, but that can also mean that
PAUSE didn't kick in quick enough.  Can you check with ethtool to be
sure?


> Last question:
> There are two stats of interest, dma_writeq_full and rx_discards
> (these stats are specific for the tg3 card):
>         ifstats.dma_writeq_full: 30066024
>         ifstats.rx_discards: 13517210
> As far as I can understand, dma_writeq_full means that the card finds
> the rx_ring full and overwrites a previous packet (so that packet is
> lost). So how can the rx_discards (packets discarded) counter less
> than the dma_writeq_full counter?

I'm just guessing here, but might it be that dma_writeq_full is the
number of discrete occasions where the NIC tried to transfer a packet
from its on-chip RAM to the host's RAM and didn't have space?  If
that's so, it's still possible that the host's receive buffers got
some space before the NIC's buffers overflowed so that no packets
were lost.  (And then rx_discards might be the number of occasions
where a packet arrived, and both the host's RX ring _and_ the NIC's
RX ring were full.)  As I said, just guessing.


cheers,
Lennert
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to