On Fri, 23 Jan 2004, Matthew Dillon wrote: > I tracked down an occassional buildworld failure on DragonFly to my > XL driver, which is synchronized to 4.x's XL driver.
It would be very helpful if you could do the following: (1) See if you can reproduce this using something other than NFS -- perhaps netperf using UDP_STREAM or the like, between that machine and another machine. This would give us a more reproduceable workload than "builds", and hopefully one that is less sensitive to things like context switching, etc. (2) See if you can reproduce this with a stock 4.9-RELEASE kernel (or 4-STABLE). While the drivers are similar between 4.x and DFBSD, there are actually quite a few structural changes in the DFBSD version. Maybe it would make sense to try backing out the local DFBSD changes to the base FreeBSD version, even if not trying a completely FreeBSD system, to see if they are the cause. It's difficult to diff the two because of reorganization and style changes. > [EMAIL PROTECTED]:6:0: class=0x020000 card=0x764610b7 chip=0x764610b7 rev=0x30 > hdr=0x00 Does this card have a product name, or is it one of those chips embedded in a motherboard without a separate name? I took a look through the xl cards/chips on my various machines, and was unable to find anything that had remotely the same card or chip ID. I did some high-volume packet flows between them with hardware checksumming disabled and didn't see any corrupted UDP packets, but the workloads I'm using sound pretty different. Knowing it could be reproduced using a more simple workload (and the specifics) would be good. FYI, I checked the Linux driver for these cards, and didn't see mention of any quirks for the particular chips/card you're using. The only thing of note in the Linux driver was the following: /* Check the PCI latency value. On the 3c590 series the latency timer must be set to the maximum value to avoid data corruption that occurs when the timer expires during a transfer. This bug exists the Vortex chip only. */ if (pdev) { u8 pci_latency; u8 new_latency = (drv_flags & IS_VORTEX) ? 248 : 32; pci_read_config_byte(pdev, PCI_LATENCY_TIMER, &pci_latency); if (pci_latency < new_latency) { printk(KERN_INFO "%s: Overriding PCI latency" " timer (CFLT) setting of %d, new value is %d.\n", dev->name, pci_latency, new_latency); pci_write_config_byte(pdev, PCI_LATENCY_TIMER, new_latency); } } The rate at which you have failures sounds like it could be a similar issue, however -- an occasional collision between a timer and DMA. NFS is often a mix of small RPCs handling lookups and attributes, and larger RPCs carrying data. Using netperf or a related tool might help you identify if one of those is more likely to cause the failure. Robert N M Watson FreeBSD Core Team, TrustedBSD Projects [EMAIL PROTECTED] Senior Research Scientist, McAfee Research _______________________________________________ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"