Hi Eric,
I CCed netdev since this stuff is about network and not lkml.
Ok, dropped the CC...
What kind of machine do you have ? SMP or not ?
It's a HP system with two dual core CPUs at 3GHz, the storage system is connected through QLogic FC-HBA. It should really be fast enough to handle a data stream of 50 MB/s...
If you have many sockets on this machine, lsof can be very slow reading /proc/net/tcp and/or /proc/net/udp, locking some tables long enough to drop packets.
First I tried with one UDP socket and during tests I switched to 16 sockets with no effect. As I removed nearly all daemons there aren't many open sockets. /proc/net/tcp seems to be one cause of the problem: a simple "cat /proc/net/tcp" leads nearly allways to immediate UDP packet loss. So it seems that reading TCP statistics blocks UDP packet processing. As it isn't my goal to collect statistics all the time, I could live with disabling access to /proc/net/tcp, but I wouldn't call this a good solution...
If you have a low count of tcp sockets, you might want to boot with thash_entries=2048 or so, to reduce tcp hash table size.
This did help a lot, I tried thash_entries=10 and now only a while loop around the "cat ...tcp" triggers packet loss. Tests are now running and I can say more tomorrow. Getting information about thash_entries is really hard. Even finding out the default value: For a system with 2GB RAM it could be around 100000.
no RcvbufErrors error as well ?
The kernel is a bit too old (2.6.18). Looking at the patch from 2.16.18 to 1.6.19 I found that RcvbufErrors is only increased when InErrors is increased. So my answer would be yes.
> - Network card is handled by bnx2 kernel module
I dont know this NIC, does it support ethtool ?
It is a "Broadcom Corporation NetXtreme II BCM5708S Gigabit Ethernet (rev 12)", and it seems ethtool is supported. The output below was captured after packet loss (I don't see any hints, but maybe you):
ethtool -S eth0
NIC statistics: rx_bytes: 155481467364 rx_error_bytes: 0 tx_bytes: 5492161 tx_error_bytes: 0 rx_ucast_packets: 18341 rx_mcast_packets: 137321933 rx_bcast_packets: 2380 tx_ucast_packets: 14416 tx_mcast_packets: 190 tx_bcast_packets: 8 tx_mac_errors: 0 tx_carrier_errors: 0 rx_crc_errors: 0 rx_align_errors: 0 tx_single_collisions: 0 tx_multi_collisions: 0 tx_deferred: 0 tx_excess_collisions: 0 tx_late_collisions: 0 tx_total_collisions: 0 rx_fragments: 0 rx_jabbers: 0 rx_undersize_packets: 0 rx_oversize_packets: 0 rx_64_byte_packets: 244575 rx_65_to_127_byte_packets: 6828 rx_128_to_255_byte_packets: 167 rx_256_to_511_byte_packets: 94 rx_512_to_1023_byte_packets: 393 rx_1024_to_1522_byte_packets: 137090597 rx_1523_to_9022_byte_packets: 0 tx_64_byte_packets: 52 tx_65_to_127_byte_packets: 7547 tx_128_to_255_byte_packets: 3304 tx_256_to_511_byte_packets: 399 tx_512_to_1023_byte_packets: 897 tx_1024_to_1522_byte_packets: 2415 tx_1523_to_9022_byte_packets: 0 rx_xon_frames: 0 rx_xoff_frames: 0 tx_xon_frames: 0 tx_xoff_frames: 0 rx_mac_ctrl_frames: 0 rx_filtered_packets: 158816 rx_discards: 0 rx_fw_discards: 0
ethtool -c eth0
Coalesce parameters for eth1: Adaptive RX: off TX: off stats-block-usecs: 999936 sample-interval: 0 pkt-rate-low: 0 pkt-rate-high: 0 rx-usecs: 18 rx-frames: 6 rx-usecs-irq: 18 rx-frames-irq: 6 tx-usecs: 80 tx-frames: 20 tx-usecs-irq: 80 tx-frames-irq: 20 rx-usecs-low: 0 rx-frame-low: 0 tx-usecs-low: 0 tx-frame-low: 0 rx-usecs-high: 0 rx-frame-high: 0 tx-usecs-high: 0 tx-frame-high: 0
ethtool -g eth0
Ring parameters for eth1: Pre-set maximums: RX: 1020 RX Mini: 0 RX Jumbo: 0 TX: 255 Current hardware settings: RX: 100 RX Mini: 0 RX Jumbo: 0 TX: 255
Just to make sure, does your application setup a huge enough SO_RCVBUF val?
Yes, my first try with one socket was 5MB, but I also tested with 10 and even 25MB. With 16 sockets I also set it to 5MB. When pausing the application netstat shows the filled buffers.
What values do you have in /proc/sys/net/ipv4/tcp_rmem ?
I kept the default values there: 4096 43689 87378
cat /proc/meminfo
MemTotal: 2060664 kB MemFree: 146536 kB Buffers: 10984 kB Cached: 1667740 kB SwapCached: 0 kB Active: 255228 kB Inactive: 1536352 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 2060664 kB LowFree: 146536 kB SwapTotal: 0 kB SwapFree: 0 kB Dirty: 820740 kB Writeback: 112 kB Mapped: 127612 kB Slab: 104184 kB CommitLimit: 1030332 kB Committed_AS: 774944 kB PageTables: 1928 kB VmallocTotal: 34359738367 kB VmallocUsed: 6924 kB VmallocChunk: 34359731259 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 Hugepagesize: 2048 kB Thanks for your help! Regards, John - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html