On Thu, Dec 29, 2005 at 01:34:43AM +0100, Aritz Bastida wrote: > Hello again,
Hi, > Now I send you the statistics I have collected in the test I've done. > But before, another problem I didn't tell before, because it didn't > happen always. But it's quite strange. I have used pktgen in quite a > few machines and in all of them, if you say clone_skb=1000 or so, the > performance boosts. With the Pentium 4, however, it doesnt do > anything. I mean: > > clone_skb = 0 --> aprox. 400kpps > clone_skb = 100000 --> aprox. 400kpps That probably means you have enough CPU power, but the transaction latency on the PCI bus is too high. F.e. your NIC can't pull data over the bus quick enough, but the CPU spends most of its time doing nothing, so it doesn't matter whether it has to clone a few packets in that idle time or not. > In the Pentium 3, on the other hand, I can see the performance boost > in all its essence (from 100kpps with clone_skb=0 to 400kpps with > clone_skb=100000). CPU time is probably more critical in this case. You can get a feel for how the CPU/IO balance is by trying a few intermediate values for clone_skb. > Why are these results? From what you have already > told me, I guess the bottleneck in the Pentium 4 is the PCI bus > (33MHz), which cant send faster. As it's seen, the machine has enough > time to alloc new skb's before the packets are sent. So the unique > perceptible difference between the two would be the idle time. Am I > right? Sounds about right, but there's a number of other tools you can use (for example, oprofile) to find out! > I hope these stats (combined with the information I provided in the > previous email) will let you understand if the receiving machine has > got HW_FLOW on or off. Hard to be sure. There are rx_discards, but that can also mean that PAUSE didn't kick in quick enough. Can you check with ethtool to be sure? > Last question: > There are two stats of interest, dma_writeq_full and rx_discards > (these stats are specific for the tg3 card): > ifstats.dma_writeq_full: 30066024 > ifstats.rx_discards: 13517210 > As far as I can understand, dma_writeq_full means that the card finds > the rx_ring full and overwrites a previous packet (so that packet is > lost). So how can the rx_discards (packets discarded) counter less > than the dma_writeq_full counter? I'm just guessing here, but might it be that dma_writeq_full is the number of discrete occasions where the NIC tried to transfer a packet from its on-chip RAM to the host's RAM and didn't have space? If that's so, it's still possible that the host's receive buffers got some space before the NIC's buffers overflowed so that no packets were lost. (And then rx_discards might be the number of occasions where a packet arrived, and both the host's RX ring _and_ the NIC's RX ring were full.) As I said, just guessing. cheers, Lennert - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html