Mag Gam put forth on 11/30/2010 5:17 AM: > Stan, > > sorry for the late response. > > lspci gives me this about my Ethernet adapter. > > . > 04:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit > Ethernet Controller (rev 06) > Subsystem: Hewlett-Packard Company NC360T PCI Express Dual > Port Gigabit Server Adapter > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- > ParErr+ Stepping- SERR- FastB2B- > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- > <TAbort- <MAbort- >SERR- <PERR- > Latency: 0, Cache Line Size: 64 bytes > Interrupt: pin A routed to IRQ 154 > Region 0: Memory at fbfe0000 (32-bit, non-prefetchable) [size=128K] > Region 1: Memory at fbfc0000 (32-bit, non-prefetchable) [size=128K] > Region 2: I/O ports at 5000 [size=32] > [virtual] Expansion ROM at e6200000 [disabled] [size=128K]
The NC360T supports TCP checksum & segmentation offloading, but it doesn't support full TCP/IP offloading like the iSCSI HBAs do. Even so, checksum and segmentation offloading will yield a small packet latency improvement. The latency gain, however, will be minuscule compared to what you can get by optimizing your user land application, which is the source of the bulk of your latency. User land always is. > The target application is a OTP (online transaction processing) with > is driven by CICS. The volume maybe high but latency is important. The > application is CPU bound (80% on 1 core). but not disk/io or memory > bound. All of our servers have 16GB of memory with 8 cores. The > application is written in C (compiled with Intel based compiler). We > are using a DNS cache solution and sometimes hardcoding /etc/hosts to > avoid any DNS. It does not do too many DNS lookups. 80% of 1 core? Is this under a synthetic high transaction rate test load against the app? What latencies are you measuring per transaction at the application level at 80% load on that core? > I am really interested in tcp/ip offloading from the kernel and have > the NIC do it. I have read, > http://fiz.stanford.edu:8081/display/ramcloud/Low+latency+RPCs and it > seems very promising. Like I said, optimization here is probably going to gain you very little in decreased latency. You need to focus your optimization on the hot code paths of the server application. Can you tell us more about what this applications actually does? Is it totally CPU and network bound? You says it's slaved to CICS, so I assume you're pulling data from the mainframe CICS database over TCP/IP, manipulating it, and writing data back to the mainframe. Is this correct? Which mainframe model is this? Have you measured the latencies there? Given the mainframe business, and the fact many orgs hang onto them...forever, it's very likely that your Linux server is 2-5 times faster than the mainframe on a per core basis, and that the mainframe is introducing the latency as it can't keep up with your Linux server. That's merely speculation on my part at this point. -- Stan -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/4cf5b2c6.9030...@hardwarefreak.com