Bruce Evans wrote:
On Mon, 7 Jul 2008, Andre Oppermann wrote:

Bruce Evans wrote:
What are the other overheads? I calculate 1.644Mpps counting the inter-frame
gap, with 64-byte packets and 64-header_size payloads.  If the 64 bytes
is for the payload, then the max is much lower.

The theoretical maximum at 64byte frames is 1,488,100.  I've looked
up my notes the 1.244Mpps number can be ajusted to 1.488Mpps.

Where is the extra?  I still get 1.644736 Mpps (10^9/(8*64+96)).
1.488095 is for 64 bits extra (10^9/(8*64+96+64)).

The preamble has 64 bits and is in addition to the inter-frame gap.

I hoped to reach 1Mpps with the hardware I mentioned some mails before, but 2Mpps is far far away.
Currently I get 160kpps via pci-32mbit-33mhz-1,2ghz mobile pentium.

This is more or less expected.  PCI32 is not able to sustain high
packet rates.  The bus setup times kill the speed.  For larger packets
the ratio gets much better and some reasonable throughput can be achieved.

I get about 640 kpps without forwarding (sendto: slightly faster;
recvfrom: slightly slower) on a 2.2GHz A64.  Underclocking the memory
from 200MHz to 100MHz only reduces the speed by about 10%, while not
overclocking the CPU by 10% reduces the speed by the same 10%, so the
system is apparently still mainly CPU-bound.

On [EMAIL PROTECTED]  He's using a 1.2GHz Mobile Pentium on top of that.

Yes.  My example shows that FreeBSD is more CPU-bound than I/O bound up
to CPUs considerably faster than a 1.2GHz Pentium (though PentiumM is
fast relative to its clock speed).  The memory interface may matter more
than the CPU clock.

NetFPGA doesn't have enough TCAM space to be useful for real routing
(as in Internet sized routing table). The trick many embedded networking
CPUs use is cache prefetching that is integrated with the network
controller.  The first 64-128bytes of every packet are transferred
automatically into the L2 cache by the hardware. This allows relatively slow CPUs (700 MHz Broadcom BCM1250 in Cisco NPE-G1 or 1.67-GHz Freescale
7448 in NPE-G2) to get more than 1Mpps.  Until something like this is
possible on Intel or AMD x86 CPUs we have a ceiling limited by RAM speed.

Does using fa$ter memory (speed and/or latency) help here?  64 bytes
is so small that latency may be more of a problem, especially without
a prefetch.

Latency.  For IPv4 packet forwarding only one cache line per packet
is fetched.  More memory speed only helps with the DMA from/to the
network card.

I use low-end memory, but on the machine that does 640 kpps it somehow
has latency almost 4 times as low as on new FreeBSD cluster machines
(~42 nsec instead of ~150).  perfmon (fixed for AXP and A64) and hwpmc
report an average of 11 k8-dc-misses per sendto() while sending via
bge at 640 kpps.  11 * 42 accounts for 442 nsec out of the 1562 per
packet at this rate.  11 * 150 = 1650 would probably make this rate
unachievable despite the system having 20 times as much CPU and bus.

We were talking routing here.  That is a packet received via network
interface and sent out on another.  Crosses the PCI bus twice.

--
Andre
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to