RE: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question
> [0.696357] ioatdma: Intel(R) QuickData Technology Driver 4.00 > [0.696487] ioatdma :00:04.0: channel error register unreachable > I assume this is something Supermicro has to fix? You are probably missing some kernel config option(s) :) - I did fight similar issues on a Fujitsu SandyBridge Xeon based server. Check if enabling CONFIG_X86_X2APIC helps as well as other APIC/IOMMU options. Bruno => Enabled: CONFIG_IOMMU_SUPPORT CONFIG_INTEL_IOMMU CONFIG_INTEL_IOMMU_DEFAULT_ON CONFIG_IRQ_REMAP Also tried enabling NUMA, etc: [0.330998] ACPI FADT declares the system doesn't support PCIe ASPM, so disable it [0.331068] ACPI: bus type pci registered [0.615234] ACPI: Dynamic OEM Table Load: [0.615373] ACPI: PRAD (null) 000BE (v02 PRADID PRADTID 0001 MSFT 0400) [0.615631] \_SB_:_OSC invalid UUID [0.615633] _OSC request data:1 7 [0.663138] pci :ff:13.5: [8086:3c44] type 00 class 0x110100 [0.663170] pci :ff:13.6: [8086:3c45] type 00 class 0x088000 [0.663211] pci:ff: ACPI _OSC support notification failed, disabling PCIe ASPM [0.663281] pci:ff: Unable to request _OSC control (_OSC support mask: 0x08) :( Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question
-Original Message- From: Bjorn Helgaas [mailto:bhelg...@google.com] Sent: Monday, November 26, 2012 8:00 PM To: Bruno Prémont Cc: Justin Piszcz; supp...@supermicro.com; linux-kernel@vger.kernel.org; Dan Williams Subject: Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question [Try Dan's current email address; sorry Dan] On Mon, Nov 26, 2012 at 5:56 PM, Bjorn Helgaas wrote: > [+cc Dan] > > On Mon, Nov 26, 2012 at 2:42 PM, Bruno Prémont > wrote: >> Hi Justin, >> >> On Sat, 24 November 2012 "Justin Piszcz" wrote: >>> Is the following normal on an X9SRL-F board (bios 1.0a)? >>> >>> In the manual it states: >>> >>> Data Direct I/O >>> Select Enabled to enable Intel I/OAT (I/O Acceleration Technology), which >>> significantly reduces CPU overhead by leveraging CPU architectural >>> improvements and freeing the system resource for other tasks. The options >>> are Disabled and Enabled. >>> >>> Default is Enabled. >>> >>> When enabled in the kernel, I see the following: >>> >>> [0.696357] ioatdma: Intel(R) QuickData Technology Driver 4.00 >>> [0.696487] ioatdma :00:04.0: channel error register unreachable >>> [0.696546] ioatdma :00:04.0: channel enumeration error >>> [0.696604] ioatdma :00:04.0: Intel(R) I/OAT DMA Engine init failed >>> [0.696721] ioatdma :00:04.1: channel error register unreachable >>> [0.696779] ioatdma :00:04.1: channel enumeration error >>> [0.697522] ioatdma :00:04.1: Intel(R) I/OAT DMA Engine init failed >>> [0.697617] ioatdma :00:04.2: channel error register unreachable >>> [0.697681] ioatdma :00:04.2: channel enumeration error >>> [0.697739] ioatdma :00:04.2: Intel(R) I/OAT DMA Engine init failed >>> [0.697831] ioatdma :00:04.3: channel error register unreachable >>> [0.697890] ioatdma :00:04.3: channel enumeration error >>> [0.697948] ioatdma :00:04.3: Intel(R) I/OAT DMA Engine init failed >>> [0.698037] ioatdma :00:04.4: channel error register unreachable >>> [0.698095] ioatdma :00:04.4: channel enumeration error >>> [0.698153] ioatdma :00:04.4: Intel(R) I/OAT DMA Engine init failed >>> [0.698245] ioatdma :00:04.5: channel error register unreachable >>> [0.698303] ioatdma :00:04.5: channel enumeration error >>> [0.698360] ioatdma :00:04.5: Intel(R) I/OAT DMA Engine init failed >>> [0.698449] ioatdma :00:04.6: channel error register unreachable >>> [0.698508] ioatdma :00:04.6: channel enumeration error >>> [0.698565] ioatdma :00:04.6: Intel(R) I/OAT DMA Engine init failed >>> [0.698676] ioatdma :00:04.7: channel error register unreachable >>> [0.698735] ioatdma :00:04.7: channel enumeration error >>> [0.698792] ioatdma :00:04.7: Intel(R) I/OAT DMA Engine init failed >>> >>> -- >>> >>> Also, I tried using ASPM (enabled in BIOS), but since ACPI Linux query is >>> ignored, it fails to work: >>> [0.562229] [Firmware Bug]: ACPI: BIOS _OSI(Linux) query ignored >>> >>> I assume this is something Supermicro has to fix? >> >> You are probably missing some kernel config option(s) :) - I did fight similar >> issues on a Fujitsu SandyBridge Xeon based server. >> >> Check if enabling CONFIG_X86_X2APIC helps as well as other APIC/IOMMU options. > > Changing config options is not a valid fix for error messages like > this. We should be able to make the config smarter by adding > dependencies or something, or else make the driver smart enough to > give a more useful diagnostic. > > The "channel error register unreachable" message indicates that > pci_read_config_dword() failed. The register in question > (IOAT_PCI_CHANERR_INT_OFFSET) is at 0x180, so possibly we don't have > PCI config accessors for the extended config space (0x100-0xfff). A > complete dmesg log should show that. -- Here is the full dmesg: (I went back to my older kernel, let me know if you need a dmesg w/ those options enabled) http://home.comcast.net/~jpiszcz/20121126/dmesg.txt Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
3.6.8 - CONFIG_IXGBE_HWMON -- how to poll Intel 10GbE temperature?
Hello, Which user-space application has support to read the temperature off of the 10GbE card? Regular lm-sensors does not seem to be picking it up. $ sensors|grep -e - radeon-pci-0500 coretemp-isa- nct6776-isa-0a30 Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question
-Original Message- From: Bjorn Helgaas [mailto:bhelg...@google.com] Sent: Monday, November 26, 2012 8:12 PM To: Justin Piszcz Cc: Bruno Prémont; supp...@supermicro.com; linux-kernel@vger.kernel.org; Dan Williams Subject: Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question On Mon, Nov 26, 2012 at 6:00 PM, Justin Piszcz wrote: > > > -Original Message- > From: Bjorn Helgaas [mailto:bhelg...@google.com] > Sent: Monday, November 26, 2012 8:00 PM > To: Bruno Prémont > Cc: Justin Piszcz; supp...@supermicro.com; linux-kernel@vger.kernel.org; Dan > Williams > Subject: Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware > bug question > > [Try Dan's current email address; sorry Dan] > > On Mon, Nov 26, 2012 at 5:56 PM, Bjorn Helgaas wrote: >> [+cc Dan] >> >> On Mon, Nov 26, 2012 at 2:42 PM, Bruno Prémont >> wrote: >>> Hi Justin, >>> >>> On Sat, 24 November 2012 "Justin Piszcz" wrote: >>>> Is the following normal on an X9SRL-F board (bios 1.0a)? >>>> >>>> In the manual it states: >>>> >>>> Data Direct I/O >>>> Select Enabled to enable Intel I/OAT (I/O Acceleration Technology), > which >>>> significantly reduces CPU overhead by leveraging CPU architectural >>>> improvements and freeing the system resource for other tasks. The> > Here is the full dmesg: (I went back to my older kernel, let me know if you > need a dmesg w/ those options enabled) > http://home.comcast.net/~jpiszcz/20121126/dmesg.txt It looks like maybe you don't have CONFIG_PCI_MMCONFIG turned on? Hi, I have two supermicro boards I am trying this on, I tried this on another system I have (X8DTH-6F), with all of these options enabled, the system does not boot. It cannot talk to the SATA boot drive. " 5520 chips built in, the X8DTH-6/X8DTH-6F/X8DTH-i/X8DTH-iF offers .. The Intel I/OAT (I/O Acceleration Technology) significantly reduces CPU over- head by ..." When the following options are enabled, the system does not boot: +CONFIG_HAVE_INTEL_TXT=y +CONFIG_IOMMU_API=y +CONFIG_IOMMU_SUPPORT=y +CONFIG_DMAR_TABLE=y +CONFIG_INTEL_IOMMU=y +CONFIG_INTEL_IOMMU_DEFAULT_ON=y +CONFIG_INTEL_IOMMU_FLOPPY_WA=y It fails like so: (Fails to talk to the SSD) http://home.comcast.net/~jpiszcz/20121127/photo1-resize.jpg (then, a few moments later: Kernel panic) http://home.comcast.net/~jpiszcz/20121127/photo2-resize.jpg With those options disabled, the system boots (and always has booted fine). Is there a certain combination of parameters that allows I/OAT to be enabled _and_ allow the system to boot? Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question
> It looks like maybe you don't have CONFIG_PCI_MMCONFIG turned on? ===> FOR I/OAT DMA Latest status, it _appears_ its working on the X9SRL-F now, thank you! 1) Supermicro X9SRL-F (GOOD) [0.738510] ioatdma: Intel(R) QuickData Technology Driver 4.00 [0.738719] ioatdma :00:04.0: irq 75 for MSI/MSI-X [0.739088] ioatdma :00:04.1: irq 76 for MSI/MSI-X [0.739408] ioatdma :00:04.2: irq 77 for MSI/MSI-X [0.739739] ioatdma :00:04.3: irq 78 for MSI/MSI-X [0.740040] ioatdma :00:04.4: irq 79 for MSI/MSI-X [0.740342] ioatdma :00:04.5: irq 80 for MSI/MSI-X [0.740670] ioatdma :00:04.6: irq 81 for MSI/MSI-X [0.740971] ioatdma :00:04.7: irq 82 for MSI/MSI-X It is _not_ working on the: 2) Supermicro X8DTH-F (the boot drive in this system is running off a PCI-e card, could the IRQ for the I/O controller be getting re-mapped and fail?)-- worse case I can move the SSD from the 6.0gbpa SATA card to the motherboard and see if that works, but that kind of defeats the purpose of a 6.0gbps SATA SSD. (Fails to talk to the SSD) http://home.comcast.net/~jpiszcz/20121127/photo1-resize.jpg (then, a few moments later: Kernel panic) http://home.comcast.net/~jpiszcz/20121127/photo2-resize.jpg Would be curious if anyone had any suggestions besides removing the controller card? -- ==> Further issues with the X9SRL-F -- does this board support ASPM or is this a Linux/ASPM implementation issue? [0.632170] pci:ff: ACPI _OSC support notification failed, disabling PCIe ASPM [0.632239] pci:ff: Unable to request _OSC control (_OSC support mask: 0x08) Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question
> It is _not_ working on the: > 2) Supermicro X8DTH-F (the boot drive in this system is running off a PCI-e > card, could the IRQ for the I/O controller be getting re-mapped and fail?)-- > worse case I can move the SSD from the 6.0gbpa SATA card to the motherboard > and see if that works, but that kind of defeats the purpose of a 6.0gbps > SATA SSD. When IOMMU is disabled, I/OAT DMA is successful on the second motherboard (X8DTH-6F). Specifically: --- DMA Engine support [*] Intel I/OAT DMA support [*] Network: TCP receive copy offload [*] Async_tx: Offload support for the async_tx api When IOMMU/X2APIC is enabled on the X8DTH-6F it fails to boot. Will keep doing more testing to see if I get anywhere w/regards to the IOMMU. Proof of success: [0.757467] ioatdma: Intel(R) QuickData Technology Driver 4.00 [0.757690] ioatdma :00:16.0: irq 88 for MSI/MSI-X [0.757948] ioatdma :00:16.1: irq 89 for MSI/MSI-X [0.758166] ioatdma :00:16.2: irq 90 for MSI/MSI-X [0.758377] ioatdma :00:16.3: irq 91 for MSI/MSI-X [0.758577] ioatdma :00:16.4: irq 92 for MSI/MSI-X [0.758794] ioatdma :00:16.5: irq 93 for MSI/MSI-X [0.759000] ioatdma :00:16.6: irq 94 for MSI/MSI-X [0.759214] ioatdma :00:16.7: irq 95 for MSI/MSI-X [0.759461] ioatdma :80:16.0: irq 96 for MSI/MSI-X [0.759720] ioatdma :80:16.1: irq 97 for MSI/MSI-X [0.759963] ioatdma :80:16.2: irq 98 for MSI/MSI-X [0.760190] ioatdma :80:16.3: irq 99 for MSI/MSI-X [0.760414] ioatdma :80:16.4: irq 100 for MSI/MSI-X [0.760630] ioatdma :80:16.5: irq 101 for MSI/MSI-X [0.760862] ioatdma :80:16.6: irq 102 for MSI/MSI-X [0.761081] ioatdma :80:16.7: irq 103 for MSI/MSI-X -- ==> Further issues with the X9SRL-F -- does this board support ASPM or is this a Linux/ASPM implementation issue? [0.632170] pci:ff: ACPI _OSC support notification failed, disabling PCIe ASPM [0.632239] pci:ff: Unable to request _OSC control (_OSC support mask: 0x08) Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question
-Original Message- From: Justin Piszcz [mailto:jpis...@lucidpixels.com] Sent: Tuesday, November 27, 2012 8:56 AM To: 'Bjorn Helgaas' Cc: 'Bruno Prémont'; supp...@supermicro.com; linux-kernel@vger.kernel.org; 'Dan Williams' Subject: RE: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question > It is _not_ working on the: > 2) Supermicro X8DTH-F (the boot drive in this system is running off a PCI-e > card, could the IRQ for the I/O controller be getting re-mapped and fail?)-- > worse case I can move the SSD from the 6.0gbpa SATA card to the motherboard > and see if that works, but that kind of defeats the purpose of a 6.0gbps > SATA SSD. When I removed the Highpoint 2-port SATA card and plugged it into the motherboard, the system boots (plugged the SSD into the motherboard). So if you use a HIGHPOINT 2-PORT SATA 6.0gbps card, do NOT enable IOMMU or it will fail to initialize the Highpoint 2-port SATA controller card! I also tried upgrading the BIOS (of the mobo, no diff) I also tried just leaving the SATA card in and plugging it into the motherboard (no diff) Removed the Highpoint 2-port SATA card and then success, it would be nice to use that card with IOMMU support though, is it just not compatible (marvell-problem?) or is a driver bug? Based on the pictures/etc sent earlier? $ dmesg|grep -i iommu [0.055134] dmar: IOMMU 0: reg_base_addr cfdfe000 ver 1:0 cap c90780106f0462 ecap f020f6 [0.055396] dmar: IOMMU 1: reg_base_addr fecfe000 ver 1:0 cap c90780106f0462 ecap f020f6 [0.760665] IOMMU 0 0xcfdfe000: using Queued invalidation [0.760803] IOMMU 1 0xfecfe000: using Queued invalidation [0.760937] IOMMU: Setting RMRR: [0.761102] IOMMU: Setting identity map for device :00:1d.0 [0xbf7ec000 - 0xbf7f] [0.761329] IOMMU: Setting identity map for device :00:1d.1 [0xbf7ec000 - 0xbf7f] [0.761542] IOMMU: Setting identity map for device :00:1d.2 [0xbf7ec000 - 0xbf7f] [0.761758] IOMMU: Setting identity map for device :00:1d.7 [0xbf7ec000 - 0xbf7f] [0.761974] IOMMU: Setting identity map for device :00:1a.0 [0xbf7ec000 - 0xbf7f] [0.762190] IOMMU: Setting identity map for device :00:1a.1 [0xbf7ec000 - 0xbf7f] [0.762407] IOMMU: Setting identity map for device :00:1a.2 [0xbf7ec000 - 0xbf7f] [0.762620] IOMMU: Setting identity map for device :00:1a.7 [0xbf7ec000 - 0xbf7f] [0.762816] IOMMU: Setting identity map for device :00:1d.0 [0xec000 - 0xe] [0.763010] IOMMU: Setting identity map for device :00:1d.1 [0xec000 - 0xe] [0.763197] IOMMU: Setting identity map for device :00:1d.2 [0xec000 - 0xe] [0.763382] IOMMU: Setting identity map for device :00:1d.7 [0xec000 - 0xe] [0.763567] IOMMU: Setting identity map for device :00:1a.0 [0xec000 - 0xe] [0.763749] IOMMU: Setting identity map for device :00:1a.1 [0xec000 - 0xe] [0.763934] IOMMU: Setting identity map for device :00:1a.2 [0xec000 - 0xe] [0.764127] IOMMU: Setting identity map for device :00:1a.7 [0xec000 - 0xe] [0.764311] IOMMU: Prepare 0-16MiB unity mapping for LPC [0.764465] IOMMU: Setting identity map for device :00:1f.0 [0x0 - 0xff] -- ==> Further issues with the X9SRL-F -- does this board support ASPM or is this a Linux/ASPM implementation issue? [0.632170] pci:ff: ACPI _OSC support notification failed, disabling PCIe ASPM [0.632239] pci:ff: Unable to request _OSC control (_OSC support mask: 0x08) Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
3.6.8: dmar: DRHD: handling fault status reg 602
Hello, Any idea why this is happening (e.g. why is PTE Read Access not set?) [ 13.204560] dmar: DRHD: handling fault status reg 602 [ 13.208078] dmar: DMAR:[DMA Read] Request device [04:00.0] fault addr 0 [ 13.208078] DMAR:[fault reason 06] PTE Read access is not set [ 15.777874] dmar: DRHD: handling fault status reg 702 [ 15.777879] dmar: DMAR:[DMA Read] Request device [04:00.0] fault addr 0 [ 15.777879] DMAR:[fault reason 06] PTE Read access is not set [ 16.100453] dmar: DRHD: handling fault status reg 2 [ 16.100458] dmar: DMAR:[DMA Read] Request device [04:00.0] fault addr 0 [ 16.100458] DMAR:[fault reason 06] PTE Read access is not set [ 16.141058] dmar: DRHD: handling fault status reg 102 [ 16.141062] dmar: DMAR:[DMA Read] Request device [04:00.0] fault addr 0 [ 16.141062] DMAR:[fault reason 06] PTE Read access is not set [ 16.210102] dmar: DRHD: handling fault status reg 202 [ 16.210111] dmar: DMAR:[DMA Read] Request device [04:00.0] fault addr 0 [ 16.210111] DMAR:[fault reason 06] PTE Read access is not set [ 16.918149] ixgbe :86:00.0: eth2: NIC Link is Up 10 Gbps, Flow Control: RX/TX This is from: http://lkml.org/lkml/2012/11/27/263 Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: 3.6.8: dmar: DRHD: handling fault status reg 602
-Original Message- From: Justin Piszcz [mailto:jpis...@lucidpixels.com] Sent: Tuesday, November 27, 2012 10:16 AM To: linux-kernel@vger.kernel.org Subject: 3.6.8: dmar: DRHD: handling fault status reg 602 Hello, Any idea why this is happening (e.g. why is PTE Read Access not set?) [ 13.204560] dmar: DRHD: handling fault status reg 602 [ 13.208078] dmar: DMAR:[DMA Read] Request device [04:00.0] fault addr 0 [ 13.208078] DMAR:[fault reason 06] PTE Read access is not set [ 15.777874] dmar: DRHD: handling fault status reg 702 [ 15.777879] dmar: DMAR:[DMA Read] Request device [04:00.0] fault addr 0 [ 15.777879] DMAR:[fault reason 06] PTE Read access is not set [ 16.100453] dmar: DRHD: handling fault status reg 2 [ 16.100458] dmar: DMAR:[DMA Read] Request device [04:00.0] fault addr 0 [ 16.100458] DMAR:[fault reason 06] PTE Read access is not set [ 16.141058] dmar: DRHD: handling fault status reg 102 [ 16.141062] dmar: DMAR:[DMA Read] Request device [04:00.0] fault addr 0 [ 16.141062] DMAR:[fault reason 06] PTE Read access is not set [ 16.210102] dmar: DRHD: handling fault status reg 202 [ 16.210111] dmar: DMAR:[DMA Read] Request device [04:00.0] fault addr 0 [ 16.210111] DMAR:[fault reason 06] PTE Read access is not set [ 16.918149] ixgbe :86:00.0: eth2: NIC Link is Up 10 Gbps, Flow Control: RX/TX This is from: http://lkml.org/lkml/2012/11/27/263 Justin. -- Hi, Disregard, appears to be a nouveau bug: https://bugzilla.redhat.com/show_bug.cgi?id=573173 Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question
-Original Message- From: Bjorn Helgaas [mailto:bhelg...@google.com] Sent: Wednesday, November 28, 2012 6:54 PM To: Justin Piszcz Cc: Bruno Prémont; supp...@supermicro.com; linux-kernel@vger.kernel.org; Dan Williams Subject: Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question On Tue, Nov 27, 2012 at 6:49 AM, Justin Piszcz wrote: > >> It looks like maybe you don't have CONFIG_PCI_MMCONFIG turned on? > > ===> FOR I/OAT DMA > Latest status, it _appears_ its working on the X9SRL-F now, thank you! > > 1) Supermicro X9SRL-F (GOOD) > [0.738510] ioatdma: Intel(R) QuickData Technology Driver 4.00 > [0.738719] ioatdma :00:04.0: irq 75 for MSI/MSI-X > [0.739088] ioatdma :00:04.1: irq 76 for MSI/MSI-X > [0.739408] ioatdma :00:04.2: irq 77 for MSI/MSI-X > [0.739739] ioatdma :00:04.3: irq 78 for MSI/MSI-X > [0.740040] ioatdma :00:04.4: irq 79 for MSI/MSI-X > [0.740342] ioatdma :00:04.5: irq 80 for MSI/MSI-X > [0.740670] ioatdma :00:04.6: irq 81 for MSI/MSI-X > [0.740971] ioatdma :00:04.7: irq 82 for MSI/MSI-X Good. You have two issues, and I'm going to separate them and only address the first one here. I opened a bug report [1] against the IOAT driver. It should do something more useful when CONFIG_PCI_MMCONFIG=n so we don't have to debug this again in the future. But otherwise, it sounds like this issue is resolved. [1] https://bugzilla.kernel.org/show_bug.cgi?id=51101 -- Yes--(agree w/ config option) Thank you! Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question
-Original Message- From: Bjorn Helgaas [mailto:bhelg...@google.com] Sent: Wednesday, November 28, 2012 7:09 PM To: Justin Piszcz Cc: Bruno Prémont; supp...@supermicro.com; linux-kernel@vger.kernel.org; Dan Williams Subject: Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question On Tue, Nov 27, 2012 at 7:35 AM, Justin Piszcz wrote: > > > -Original Message- > From: Justin Piszcz [mailto:jpis...@lucidpixels.com] > Sent: Tuesday, November 27, 2012 8:56 AM > To: 'Bjorn Helgaas' > Cc: 'Bruno Prémont'; supp...@supermicro.com; linux-kernel@vger.kernel.org; > 'Dan Williams' > Subject: RE: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware > bug question > > >> It is _not_ working on the: > >> 2) Supermicro X8DTH-F (the boot drive in this system is running off a > PCI-e >> card, could the IRQ for the I/O controller be getting re-mapped and > fail?)-- >> worse case I can move the SSD from the 6.0gbpa SATA card to the > motherboard >> and see if that works, but that kind of defeats the purpose of a 6.0gbps >> SATA SSD. > > When I removed the Highpoint 2-port SATA card and plugged it into the > motherboard, the system boots (plugged the SSD into the motherboard). > So if you use a HIGHPOINT 2-PORT SATA 6.0gbps card, do NOT enable IOMMU or > it will fail to initialize the Highpoint 2-port SATA controller card! > I also tried upgrading the BIOS (of the mobo, no diff) > I also tried just leaving the SATA card in and plugging it into the > motherboard (no diff) > Removed the Highpoint 2-port SATA card and then success, it would be nice to > use that card with IOMMU support though, is it just not compatible > (marvell-problem?) or is a driver bug? Based on the pictures/etc sent > earlier? I would guess this is a core bug, but it's hard to tell without more information. If you boot with "intel_iommu=off", I would guess the Highpoint card would work (this should have the same effect as turning off CONFIG_INTEL_IOMMU). I'd like to compare the complete dmesg log for that boot with the one that fails. It sounds like it might be hard to collect the log for the failing case -- you said the boot fails when the Highpoint card is in the system even if the SSD is connected to the motherboard instead of the Highpoint card. The panic in the photo2 image looks like it's just a failure to mount the root filesystem, which is what I'd expect if we can't find the SSD. It seems like we ought to be able to *boot* with the SSD connected to the motherboard, even if the Highpoint card doesn't work. But worst-case, a video of the failing boot might be enough, especially if you can slow it down with "boot_delay=" -- SUMMARY: Card fails with iommu support in the kernel: (but system does now boot (3.6.8) with the card in as long as the system disk isn't attached to it, not sure what was wrong earlier). It seems to be working now: => SSD on motherboard => PCI-e card (highpoint in the system but not used, no disks attached) (After I enabled nouveau, not sure that has anything to do with it) I put the card in, and it errors as usual but the SSD now on the motherboard it does boot successfully. Here are the errors from the kernel trying to initialize the board with iommu enabled (retrieved via netconsole) also picture below (w/help from boot_delay=100 && nouveau enabled): http://home.comcast.net/~jpiszcz/20121128/highpoint.jpg Nov 28 19:30:16 p34 [7.771060] ata14.00: qc timeout (cmd 0xa1) Nov 28 19:30:16 p34 [8.270153] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4) Nov 28 19:30:17 p34 [9.073935] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Nov 28 19:30:27 p34 [ 19.058915] ata14.00: qc timeout (cmd 0xa1) Nov 28 19:30:28 p34 [ 19.557885] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4) Nov 28 19:30:28 p34 [ 19.558478] ata14: limiting SATA link speed to 1.5 Gbps Nov 28 19:30:29 p34 [ 20.363658] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Nov 28 19:30:48 p34 [ 39.568234] dmar: DRHD: handling fault status reg 502 Nov 28 19:30:48 p34 [ 39.571508] dmar: DMAR:[DMA Read] Request device [04:00.0] fault addr 0 [ 39.571508] DMAR:[fault reason 06] PTE Read access is not set Nov 28 19:30:59 p34 [ 50.318146] ata14.00: qc timeout (cmd 0xa1) Nov 28 19:30:59 p34 [ 50.818061] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4) Nov 28 19:31:00 p34 [ 51.621827] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question
-Original Message- From: Robert Hancock [mailto:hancock...@gmail.com] Sent: Wednesday, November 28, 2012 7:35 PM To: Justin Piszcz Cc: 'Bjorn Helgaas'; 'Bruno Prémont'; supp...@supermicro.com; linux-kernel@vger.kernel.org; 'Dan Williams' Subject: Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question What does lspci -vv show on that controller? Not sure what actual chipset that controller is, but there's a known issue with some Marvell 6Gbps SATA controllers with DMAR enabled - it seems the device issues memory read/write requests from the wrong PCI function ID and the IOMMU rightly denies access as the function listed in the requests doesn't have any mapping to that memory. I don't think there's presently a workaround other than disabling DMAR. We could (and likely should) be detecting that device and adding some kind of quirk for it. That sounds likely... It is shown below: Card name: HighPoint Rocket 620 Dual Port SATA 6 Gbps PCI Express 2.0 Host Adapter lspci -vv output: 84:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9123 PCIe SATA 6.0 Gb/s controller (rev 11) (prog-if 01 [AHCI 1.0]) Subsystem: Marvell Technology Group Ltd. 88SE9123 PCIe SATA 6.0 Gb/s controller Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- > -- > > > ==> Further issues with the X9SRL-F -- does this board support ASPM or is > this a Linux/ASPM implementation issue? > [0.632170] pci:ff: ACPI _OSC support notification failed, disabling > PCIe ASPM > [0.632239] pci:ff: Unable to request _OSC control (_OSC support > mask: 0x08) What's the full dmesg from this machine (or is it already posted somewhere)? It is now available here: http://home.comcast.net/~jpiszcz/20121128/dmesg.txt Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question
-Original Message- From: Robert Hancock [mailto:hancock...@gmail.com] Sent: Wednesday, November 28, 2012 7:55 PM To: Justin Piszcz Cc: Bjorn Helgaas; Bruno Prémont; supp...@supermicro.com; linux-kernel@vger.kernel.org; Dan Williams Subject: Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question On Wed, Nov 28, 2012 at 6:49 PM, Justin Piszcz wrote: > > > -Original Message- > From: Robert Hancock [mailto:hancock...@gmail.com] > Sent: Wednesday, November 28, 2012 7:35 PM > To: Justin Piszcz > Cc: 'Bjorn Helgaas'; 'Bruno Prémont'; supp...@supermicro.com; > linux-kernel@vger.kernel.org; 'Dan Williams' > Subject: Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware > bug question > > > What does lspci -vv show on that controller? Not sure what actual > chipset that controller is, but there's a known issue with some Marvell > 6Gbps SATA controllers with DMAR enabled - it seems the device issues > memory read/write requests from the wrong PCI function ID and the IOMMU > rightly denies access as the function listed in the requests doesn't > have any mapping to that memory. I don't think there's presently a > workaround other than disabling DMAR. We could (and likely should) be > detecting that device and adding some kind of quirk for it. > > That sounds likely... > It is shown below: > > Card name: HighPoint Rocket 620 Dual Port SATA 6 Gbps PCI Express 2.0 Host > Adapter > > lspci -vv output: > > 84:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9123 PCIe SATA > 6.0 Gb/s controller (rev 11) (prog-if 01 [AHCI 1.0]) > Subsystem: Marvell Technology Group Ltd. 88SE9123 PCIe SATA 6.0 Gb/s > controller Yeah, that's one of those controllers I think. But I can't tell from the bit of the dmesg you posted exactly what's going on. Can you post a full boot log from having the card installed and some drive attached (by putting the boot drive on another controller for example)? >> ==> Further issues with the X9SRL-F -- does this board support ASPM or is >> this a Linux/ASPM implementation issue? >> [0.632170] pci:ff: ACPI _OSC support notification failed, > disabling >> PCIe ASPM >> [0.632239] pci:ff: Unable to request _OSC control (_OSC support >> mask: 0x08) > > What's the full dmesg from this machine (or is it already posted somewhere)? > > It is now available here: > http://home.comcast.net/~jpiszcz/20121128/dmesg.txt > Is that the same boot log? It doesn't have this error in it. Yes, the error is here: (its towards the bottom) [7.973015] ata14.00: qc timeout (cmd 0xa1) [8.472120] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4) [9.275922] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300) [ 19.260667] ata14.00: qc timeout (cmd 0xa1) [ 19.759828] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4) [ 19.760451] ata14: limiting SATA link speed to 1.5 Gbps [ 20.566598] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 310) [ 50.521078] ata14.00: qc timeout (cmd 0xa1) [ 51.020880] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4) [ 51.824664] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 310) [ 51.824682] dmar: DRHD: handling fault status reg 502 [ 51.824686] dmar: DMAR:[DMA Read] Request device [04:00.0] fault addr 0 [ 51.824686] DMAR:[fault reason 06] PTE Read access is not set [ 52.338871] EXT3-fs (sdb2): error: couldn't mount because of unsupported optional features (240) [ 52.348938] EXT2-fs (sdb2): error: couldn't mount because of unsupported optional features (240) [ 52.360314] EXT4-fs (sdb2): mounted filesystem with ordered data mode. Opts: (null) The system does not boot when the SSD is on that SATA controller. The error we were trying to get earlier (kernel panic)-- I cannot reproduce that anymore after adding nouveau for whatever reason. So to re-cap it boots now with nothing connected to the controller but the controller is non-workable/useless, as shown above. When you put the SSD on it, it cannot mount rootfs. Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
kernel 3.4.0 (i686): pagevec_lookup_tag+0x21/0x30
Hi, Was curious why this occurs with postfix's cleanup every so often? [562320.275125] INFO: task cleanup:5921 blocked for more than 120 seconds. [562320.275127] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [562320.275129] cleanup D 000d 0 5921 2437 0x [562320.275132] f2986080 0086 ca379ed4 000d 0001 c16ca000 c1078521 000e [562320.275137] ca379edc c106f3c6 0001 000e 0001 ce0a965c [562320.275141] f5a84070 0003 0001 f5a8406c 0292 f5a84014 c1043c5e [562320.275146] Call Trace: [562320.275148] [] ? pagevec_lookup_tag+0x21/0x30 [562320.275151] [] ? filemap_fdatawait_range+0x86/0x140 [562320.275154] [] ? __wake_up+0x3e/0x60 [562320.275156] [] ? prepare_to_wait+0x1d/0x70 [562320.275159] [] ? jbd2_log_wait_commit+0x95/0x100 [562320.275162] [] ? abort_exclusive_wait+0x90/0x90 [562320.275164] [] ? ext4_sync_file+0x13c/0x2c0 [562320.275167] [] ? chmod_common+0x74/0x90 [562320.275169] [] ? vfs_write+0x106/0x140 [562320.275172] [] ? ext4_flush_completed_IO+0x90/0x90 [562320.275175] [] ? vfs_fsync+0x2b/0x40 [562320.275177] [] ? sys_fsync+0x20/0x40 [562320.275180] [] ? sysenter_do_call+0x12/0x26 Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
3.6.0 kernel - ext4 corruption, or?
Hi, I read there was a bug in 3.6.2, is there also one in 3.6.0, or can someone help explain this? I did boot systemrescuecd 3.0.0 and ran fsck.ext4 -f partition and there were no errors reported. Seems the inode for the directory is missing? # grep 10.0.0.11 -r /etc /etc/posfix.old/master.cf:10.0.0.11:smtp inet n - - -1postscreen # ls -l /etc/posfix.old/master.cf -rw-r--r-- 1 root root 13236 Oct 28 2011 /etc/posfix.old/master.cf # ls -ld /etc/postfix.old ls: cannot access /etc/postfix.old: No such file or directory Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: 3.6.0 kernel - ext4 corruption, or?
-Original Message- From: ty...@mit.edu [mailto:ty...@mit.edu] Sent: Wednesday, October 24, 2012 10:40 AM To: Justin Piszcz; Steven J. Magnani Cc: linux-kernel@vger.kernel.org Subject: Re: 3.6.0 kernel - ext4 corruption, or? On Wed, Oct 24, 2012 at 08:15:37AM -0400, Justin Piszcz wrote: > > I read there was a bug in 3.6.2, is there also one in 3.6.0, or can someone > help explain this? The problem which we are currently trying to investigate was reportedly introduced in v3.6.1. So far that's about how we know; we have two users who have reported it, but I and other ext4 developers haven't been able to reproduce it yet. Got it. > # grep 10.0.0.11 -r /etc > /etc/posfix.old/master.cf:10.0.0.11:smtp inet n - - > -1postscreen > > # ls -l /etc/posfix.old/master.cf > -rw-r--r-- 1 root root 13236 Oct 28 2011 /etc/posfix.old/master.cf > > # ls -ld /etc/postfix.old > ls: cannot access /etc/postfix.old: No such file or directory > Looks like you or some script renamed /etc/postfix to /etc/posfix.old as part some upgrade? Whoops-- sorry about that, typo, so for now (concerning the other bug) it's best to stay with 3.6.0 until the bug is found, thank you. Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
3.6.0 ext4 dump/filemap_fault?
Hello, Any idea what happened here (during a backup)? Partition is ext4. [116868.118797] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [116868.118798] dumpD 88003d32a5c0 0 21219 21214 0x [116868.118801] 880584631d28 0082 880584631d48 880584631fd8 [116868.118803] 880584631fd8 4000 81a14420 88003d32a5c0 [116868.118804] 8807d41449c0 88081bcea828 880584631cb8 810b88a7 [116868.118806] Call Trace: [116868.118812] [] ? filemap_fault+0x87/0x430 [116868.118814] [] ? __dequeue_entity+0x2a/0x50 [116868.118817] [] schedule+0x24/0x70 [116868.118820] [] schedule_timeout+0x17c/0x1d0 [116868.118822] [] ? check_preempt_curr+0x75/0xa0 [116868.118823] [] ? ttwu_do_wakeup+0x12/0x90 [116868.118824] [] ? ttwu_do_activate.constprop.60+0x61/0x70 [116868.118826] [] wait_for_common+0xc2/0x150 [116868.118827] [] ? try_to_wake_up+0x2d0/0x2d0 [116868.118829] [] ? fdatawrite_one_bdev+0x20/0x20 [116868.118830] [] wait_for_completion+0x18/0x20 [116868.118832] [] sync_inodes_sb+0x9e/0x1b0 [116868.118834] [] ? fdatawrite_one_bdev+0x20/0x20 [116868.118835] [] sync_inodes_one_sb+0x19/0x20 [116868.118837] [] iterate_supers+0xe1/0xf0 [116868.118838] [] sys_sync+0x30/0x90 [116868.118839] [] system_call_fastpath+0x1a/0x1f Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.6.0 ext4 dump/filemap_fault?
On Sun, Oct 28, 2012 at 6:51 PM, Theodore Ts'o wrote: > On Sun, Oct 28, 2012 at 05:02:17PM -0400, Justin Piszcz wrote: >> Hello, >> >> Any idea what happened here (during a backup)? > > A sync system call took longer than two mintues. Why that happened, > it's harder to say. It's a warning, though, and not a fatal panic or > kernel oops. Ah, got it. > > How much memory do you have in your system? 32GB memory/32GB swap > What happened afterwards? It did eventually complete. > Did the system continue, and did the sync command (I presume you ran > "sync" from the command line?) finally return to the command prompt? In this case I did not run sync, I waited for the processes/dump/etc to complete. Thanks. Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [lm-sensors] 3.10: NCT6776F sensor question with Supermicro X9SRL-F motherboard
Thanks for these details! On Thu, Jul 11, 2013 at 4:47 AM, Jean Delvare wrote: > Hi Guenter, Justin, > > On Wed, 3 Jul 2013 07:42:01 -0700, Guenter Roeck wrote: >> On Wed, Jul 03, 2013 at 08:35:59AM -0400, Justin Piszcz wrote: >> > I also found: >> > http://www.lm-sensors.org/wiki/Configurations/SuperMicro/X9SRA >> > >> > Does Super Micro also have such a config file for the X9SRL-F? >> > This board uses a NCT6776F. >> >> Supermicro does not provide configuration files. You can take the above file, >> test and update it, and let us know so we can add it to the wiki. > > Actually they do provide configuration information. They have their own > tool names SuperDoctor, which can be downloaded from: > ftp://ftp.supermicro.com/utility/SuperDoctor_II/Linux/Release/ > > If you look at file AllSuperD.ini, you'll find per-board entries > describing the input mapping. It is very helpful when writing a custom > libsensors configuration file for the board in question. It takes some > knowledge of the monitoring chip or its driver though, as the ini file > references register addresses ("Offset" in the file.) > > A gave a quick look and at least the voltage input mapping (and > presumably the voltage scaling factors as well) is similar to the > X9SRA so you can reuse this part of the X9SRA configuration file. > > Hope that helps, > -- > Jean Delvare -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[no subject]
subscribe linux-raid -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
3.10: discard/trim support on md-raid1?
Hello, Running 3.10 and I see the following for an md-raid1 of two SSDs: Checking /sys/block/md1/queue: add_random: 0 discard_granularity: 512 discard_max_bytes: 2147450880 discard_zeroes_data: 0 hw_sector_size: 512 iostats: 0 logical_block_size: 512 max_hw_sectors_kb: 32767 max_integrity_segments: 0 max_sectors_kb: 512 max_segment_size: 65536 max_segments: 168 minimum_io_size: 512 nomerges: 0 nr_requests: 128 optimal_io_size: 0 physical_block_size: 512 read_ahead_kb: 8192 rotational: 1 rq_affinity: 0 scheduler: none write_same_max_bytes: 0 What should be seen: rotational: 0 And possibly: discard_zeroes_data: 1 Can anyone confirm if there is a workaround to allow TRIM when using md-raid1? Some related discussion here: http://us.generation-nt.com/answer/md-rotational-attribute-help-206571222.ht ml http://www.progtown.com/topic343938-ssd-strange-itself-conducts.html Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
3.11 kernel: perf samples too long (2501 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
Hello, I recent upgraded from 3.10 to 3.11 and see these on occasion in the kernel log: perf samples too long (2501 > 2500), lowering kernel.perf_event_max_sample_rate to 5 perf samples too long (5040 > 5000), lowering kernel.perf_event_max_sample_rate to 25000 I was curious what is causing this/recommendation to fix this problem? Thanks, Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
3.6.0 ACPI: e1000e - Invalid Power Resource to register! (Supermicro X9SCM-F) - Disabling IRQ #44
Hello, Kernel: 3.6.0 (x86_64) Distribution: Debian Testing I was copying 600GB of files from Samba/Linux to a Windows host, it copied around 500GB, then this happened, it disabled the network interface on my Supermicro X9CM-F board (on-board) -- the first interface, any idea why this happened? [93593.565667] irq 44: nobody cared (try booting with the "irqpoll" option) [93593.565673] Pid: 0, comm: swapper/0 Not tainted 3.6.0 #4 [93593.565675] Call Trace: [0.971861] ACPI: Invalid Power Resource to register! [93593.565677][] __report_bad_irq+0x31/0xd0 [93593.565690] [] note_interrupt+0x1a3/0x1f0 [93593.565694] [] handle_irq_event_percpu+0x89/0x160 [93593.565697] [] handle_irq_event+0x3c/0x60 [93593.565700] [] handle_edge_irq+0x6f/0x110 [93593.565705] [] handle_irq+0x1d/0x30 [93593.565709] [] do_IRQ+0x55/0xd0 [93593.565714] [] common_interrupt+0x67/0x67 [93593.565715][] ? __hrtimer_start_range_ns+0x1bd/0x3b0 [93593.565728] [] ? acpi_idle_enter_c1+0xaa/0xcf [93593.565731] [] ? acpi_idle_enter_c1+0x89/0xcf [93593.565735] [] cpuidle_enter+0x19/0x20 [93593.565738] [] cpuidle_idle_call+0x88/0x100 [93593.565750] [] cpu_idle+0x5f/0xd0 [93593.565752] [] rest_init+0x68/0x74 [93593.565755] [] start_kernel+0x2a8/0x2b5 [93593.565756] [] ? repair_env_string+0x5e/0x5e [93593.565758] [] x86_64_start_reservations+0x101/0x105 [93593.565759] [] x86_64_start_kernel+0xd8/0xdc [93593.565760] handlers: [93593.565762] [] e1000_msix_other [93593.565763] Disabling IRQ #44 Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: 3.6.0 ACPI: e1000e - Invalid Power Resource to register! (Supermicro X9SCM-F) - Disabling IRQ #44
Kernel: 3.6.0 (x86_64) Distribution: Debian Testing I was copying 600GB of files from Samba/Linux to a Windows host, it copied around 500GB, then this happened, it disabled the network interface on my Supermicro X9CM-F board (on-board) -- the first interface, any idea why this happened? [93593.565667] irq 44: nobody cared (try booting with the "irqpoll" option) [93593.565673] Pid: 0, comm: swapper/0 Not tainted 3.6.0 #4 [93593.565675] Call Trace: [0.971861] ACPI: Invalid Power Resource to register! [93593.565677][] __report_bad_irq+0x31/0xd0 [93593.565690] [] note_interrupt+0x1a3/0x1f0 [93593.565694] [] handle_irq_event_percpu+0x89/0x160 [93593.565697] [] handle_irq_event+0x3c/0x60 [93593.565700] [] handle_edge_irq+0x6f/0x110 [93593.565705] [] handle_irq+0x1d/0x30 [93593.565709] [] do_IRQ+0x55/0xd0 [93593.565714] [] common_interrupt+0x67/0x67 [93593.565715][] ? __hrtimer_start_range_ns+0x1bd/0x3b0 [93593.565728] [] ? acpi_idle_enter_c1+0xaa/0xcf [93593.565731] [] ? acpi_idle_enter_c1+0x89/0xcf [93593.565735] [] cpuidle_enter+0x19/0x20 [93593.565738] [] cpuidle_idle_call+0x88/0x100 [93593.565750] [] cpu_idle+0x5f/0xd0 [93593.565752] [] rest_init+0x68/0x74 [93593.565755] [] start_kernel+0x2a8/0x2b5 [93593.565756] [] ? repair_env_string+0x5e/0x5e [93593.565758] [] x86_64_start_reservations+0x101/0x105 [93593.565759] [] x86_64_start_kernel+0xd8/0xdc [93593.565760] handlers: [93593.565762] [] e1000_msix_other [93593.565763] Disabling IRQ #44 --- A known issue with this hardware/nics it appears: https://www.centos.org/modules/newbb/viewtopic.php?topic_id=34820 Hopefully someone from Intel can chime in. Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
X9SCM-F/82574L/e1000e lag / high latency (e1000e/Intel bug)
Hi, Good news: Supermicro 2.0b fixes an unrelated problem where only 16GB is addressed in the BIOS when you have 32GB on the system, with 2.0b that is resolved. Bad news: This bug still remains (E1000): When you transfer a file/files over Samba, the latency shoots up really high (this also affects other applications!) This bug has been bothering me for months (random lag) during high network I/O on my X9SCM-F motherboard. There is a lot of discussion about this problem here: http://sourceforge.net/p/e1000/bugs/27/?page=4 I tried the EEPROM fix but it did not work: http://sourceforge.net/projects/e1000/files/e1000e%20stable/eeprom_fix_82574 _or_82583/ "The value at offset 0x001e (58) has bit 1 unset. This enables the problematic power saving feature. In this case, the EEPROM needs to read "5a" at offset 0x001e." # ethtool -e eth4 |grep 0x0010 0x0010: ff ff ff ff 6b 02 00 00 d9 15 d3 10 ff ff 5a a5 Yes I did reboot. Here is what the problem looks like, during a SAMBA copy from A->B where B=X9SCM running Linux: (ONBOARD Intel eth0 / 82574L ) $ ping windowspc PING windowspc (192.168.0.1) 56(84) bytes of data. 64 bytes from windowspc (192.168.0.1): icmp_req=1 ttl=128 time=0.544 ms 64 bytes from windowspc (192.168.0.1): icmp_req=2 ttl=128 time=0.193 ms 64 bytes from windowspc (192.168.0.1): icmp_req=3 ttl=128 time=0.619 ms 64 bytes from windowspc (192.168.0.1): icmp_req=4 ttl=128 time=0.642 ms 64 bytes from windowspc (192.168.0.1): icmp_req=5 ttl=128 time=0.426 ms 64 bytes from windowspc (192.168.0.1): icmp_req=6 ttl=128 time=0.464 ms 64 bytes from windowspc (192.168.0.1): icmp_req=7 ttl=128 time=0.696 ms 64 bytes from windowspc (192.168.0.1): icmp_req=8 ttl=128 time=1353 ms 64 bytes from windowspc (192.168.0.1): icmp_req=9 ttl=128 time=353 ms 64 bytes from windowspc (192.168.0.1): icmp_req=10 ttl=128 time=0.492 ms 64 bytes from windowspc (192.168.0.1): icmp_req=11 ttl=128 time=0.618 ms 64 bytes from windowspc (192.168.0.1): icmp_req=12 ttl=128 time=0.474 ms 64 bytes from windowspc (192.168.0.1): icmp_req=13 ttl=128 time=0.542 ms 64 bytes from windowspc (192.168.0.1): icmp_req=14 ttl=128 time=0.471 ms 64 bytes from windowspc (192.168.0.1): icmp_req=15 ttl=128 time=0.645 ms 64 bytes from windowspc (192.168.0.1): icmp_req=16 ttl=128 time=0.394 ms 64 bytes from windowspc (192.168.0.1): icmp_req=17 ttl=128 time=0.537 ms 64 bytes from windowspc (192.168.0.1): icmp_req=18 ttl=128 time=0.706 ms 64 bytes from windowspc (192.168.0.1): icmp_req=19 ttl=128 time=0.465 ms 64 bytes from windowspc (192.168.0.1): icmp_req=20 ttl=128 time=0.707 ms 64 bytes from windowspc (192.168.0.1): icmp_req=21 ttl=128 time=348 ms 64 bytes from windowspc (192.168.0.1): icmp_req=22 ttl=128 time=0.703 ms 64 bytes from windowspc (192.168.0.1): icmp_req=23 ttl=128 time=0.560 ms 64 bytes from windowspc (192.168.0.1): icmp_req=24 ttl=128 time=0.554 ms 64 bytes from windowspc (192.168.0.1): icmp_req=25 ttl=128 time=0.585 ms 64 bytes from windowspc (192.168.0.1): icmp_req=26 ttl=128 time=0.508 ms 64 bytes from windowspc (192.168.0.1): icmp_req=27 ttl=128 time=345 ms 64 bytes from windowspc (192.168.0.1): icmp_req=28 ttl=128 time=0.374 ms 64 bytes from windowspc (192.168.0.1): icmp_req=29 ttl=128 time=0.728 ms 64 bytes from windowspc (192.168.0.1): icmp_req=30 ttl=128 time=0.537 ms 64 bytes from windowspc (192.168.0.1): icmp_req=31 ttl=128 time=0.190 ms 64 bytes from windowspc (192.168.0.1): icmp_req=32 ttl=128 time=0.204 ms 64 bytes from windowspc (192.168.0.1): icmp_req=33 ttl=128 time=0.239 ms Same test (copy test) with samba as above but now with an Intel 4-port NIC: $ ping windowspc 64 bytes from windowspc (192.168.0.1): icmp_req=1 ttl=128 time=0.175 ms 64 bytes from windowspc (192.168.0.1): icmp_req=2 ttl=128 time=0.332 ms 64 bytes from windowspc (192.168.0.1): icmp_req=3 ttl=128 time=0.276 ms 64 bytes from windowspc (192.168.0.1): icmp_req=4 ttl=128 time=0.221 ms 64 bytes from windowspc (192.168.0.1): icmp_req=5 ttl=128 time=0.518 ms 64 bytes from windowspc (192.168.0.1): icmp_req=6 ttl=128 time=0.157 ms 64 bytes from windowspc (192.168.0.1): icmp_req=7 ttl=128 time=0.222 ms 64 bytes from windowspc (192.168.0.1): icmp_req=8 ttl=128 time=0.605 ms 64 bytes from windowspc (192.168.0.1): icmp_req=9 ttl=128 time=0.335 ms 64 bytes from windowspc (192.168.0.1): icmp_req=10 ttl=128 time=0.679 ms 64 bytes from windowspc (192.168.0.1): icmp_req=11 ttl=128 time=0.223 ms 64 bytes from windowspc (192.168.0.1): icmp_req=12 ttl=128 time=0.189 ms 64 bytes from windowspc (192.168.0.1): icmp_req=13 ttl=128 time=0.432 ms 64 bytes from windowspc (192.168.0.1): icmp_req=14 ttl=128 time=0.235 ms 64 bytes from windowspc (192.168.0.1): icmp_req=15 ttl=128 time=0.386 ms 64 bytes from windowspc (192.168.0.1): icmp_req=16 ttl=128 time=0.658 ms 64 bytes from windowspc (192.168.0.1): icmp_req=17 ttl=128 time=0.430 ms 64 bytes from windowspc (192.168.0.1): icmp_req=18 ttl=128 time=0.494 ms 64 bytes from windowspc (192.168.0.1): icmp_req=19 ttl=128 time=0.
RE: X9SCM-F/82574L/e1000e lag / high latency (e1000e/Intel bug)
-Original Message- From: Justin Piszcz [mailto:jpis...@lucidpixels.com] Sent: Tuesday, October 09, 2012 6:15 PM To: linux-kernel@vger.kernel.org; e1000-de...@lists.sf.net Cc: supp...@supermicro.com; a...@solarrain.com Subject: X9SCM-F/82574L/e1000e lag / high latency (e1000e/Intel bug) If you have > 16GB do not upgrade to 2.0b, I am writing up a page on this and will post it shortly, the board will show 32GB with 2.0b but when it tries to address it, the machine reboots. Stick with 2.0a for now, I will call Supermicro later, will post a page on all of these issues in a bit. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: X9SCM-F/82574L/e1000e lag / high latency (e1000e/Intel bug)
-Original Message- From: Justin Piszcz [mailto:jpis...@lucidpixels.com] Sent: Tuesday, October 09, 2012 6:15 PM To: linux-kernel@vger.kernel.org; e1000-de...@lists.sf.net Cc: supp...@supermicro.com; a...@solarrain.com Subject: X9SCM-F/82574L/e1000e lag / high latency (e1000e/Intel bug) If you have > 16GB do not upgrade to 2.0b, I am writing up a page on this and will post it shortly, the board will show 32GB with 2.0b but when it tries to address it, the machine reboots. Stick with 2.0a for now, I will call Supermicro later, will post a page on all of these issues in a bit. -- As promised: https://sites.google.com/a/lucidpixels.com/web/blog/supermicrox9scm-fissues Summary: 1. the board does not always address 32GB of ram if you are using an Ivy Bridge chip on this motherboard 2. e1000e network problems a) During heavy network I/O (file copy) on eth0 the network latency jumps to 300-1000ms+ every 4-5 seconds (it does not do this on a separate card) b) During heavy network I/O (file copy) of over 600GB of files, the kernel disabled the network IRQ on eth0 and took the server offline from a network perspective (it has not done this on a separate card..yet) 3. Problems with PCI-e cards summary: Don't expect to use all four PCI-e slots if they use a lot of power 4. clock drift issues: summary: expect some strangeness if you use gpsd/a gps to help sync your time, due to what SM noted below http://lists.ntp.org/pipermail/pool/2012-July/006019.html here is a picture of memtest86 showing "Unexpected Interrupt - Halting" when 2.0b BIOS is used with 32gb of ram: http://home.comcast.net/~jpiszcz/20121010/x9scm-web-small.jpg here is the issue with memtest86 (crashes/reboots host when you have 2.0b + 32gb of ram): https://www.youtube.com/watch?feature=player_embedded&v=M2TWO5kFm9U Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
3.10 kernel: [drm:evergreen_startup] *ERROR* radeon: error initializing UVD (-1).
Hello, I use an ATI graphics card (PCI-e x1) for a server: Card: [AMD/ATI] Park [Mobility Radeon HD 5430] I upgraded my kernel from 3.9.x to 3.10, after rebooting I found the driver now wants a new firmware: radeon :05:00.0: radeon_uvd: Can't load firmware "radeon/CYPRESS_uvd.bin" I pulled the latest firmware down (and also updated the others) and rebooted: CONFIG_EXTRA_FIRMWARE="radeon/CEDAR_me.bin radeon/CEDAR_pfp.bin radeon/CEDAR_rlc.bin radeon/CYPRESS_uvd.bin" Then have this issue: [drm] radeon: irq initialized. [drm:evergreen_startup] *ERROR* radeon: error initializing UVD (-1). The screen goes black for ~5 seconds and then come back on. Not sure if this has been reported or not (w/CEDAR chipset) but FYI. Kernel config: http://home.comcast.net/~jpiszcz/20130702/config-3.10-4.txt Full dmesg: http://home.comcast.net/~jpiszcz/20130702/full-dmesg.txt Possibly related: http://lists.opensuse.org/opensuse-bugs/2013-06/msg00053.html https://bugzilla.novell.com/show_bug.cgi?id=822777 https://bugzilla.novell.com/show_bug.cgi?id=822777#c0 http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg38066.html https://bugs.freedesktop.org/show_bug.cgi?id=63935 Snippet: [2.033535] [drm:r600_uvd_init] *ERROR* UVD not responding, trying to reset the VCPU!!! [3.053742] [drm:r600_uvd_init] *ERROR* UVD not responding, trying to reset the VCPU!!! [4.073985] [drm:r600_uvd_init] *ERROR* UVD not responding, trying to reset the VCPU!!! [5.094192] [drm:r600_uvd_init] *ERROR* UVD not responding, trying to reset the VCPU!!! [6.114405] [drm:r600_uvd_init] *ERROR* UVD not responding, trying to reset the VCPU!!! [7.134611] [drm:r600_uvd_init] *ERROR* UVD not responding, trying to reset the VCPU!!! [8.154824] [drm:r600_uvd_init] *ERROR* UVD not responding, trying to reset the VCPU!!! [9.175044] [drm:r600_uvd_init] *ERROR* UVD not responding, trying to reset the VCPU!!! [ 10.195251] [drm:r600_uvd_init] *ERROR* UVD not responding, trying to reset the VCPU!!! [ 11.215464] [drm:r600_uvd_init] *ERROR* UVD not responding, trying to reset the VCPU!!! [ 11.235537] [drm:r600_uvd_init] *ERROR* UVD not responding, giving up!!! [ 11.235607] [drm:evergreen_startup] *ERROR* radeon: error initializing UVD (-1). Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: 3.10 kernel: [drm:evergreen_startup] *ERROR* radeon: error initializing UVD (-1).
-Original Message- From: Alex Deucher [mailto:alexdeuc...@gmail.com] Sent: Tuesday, July 02, 2013 2:36 PM To: Justin Piszcz Cc: linux-kernel@vger.kernel.org; dri-de...@lists.freedesktop.org Subject: Re: 3.10 kernel: [drm:evergreen_startup] *ERROR* radeon: error initializing UVD (-1). > The screen goes black for ~5 seconds and then come back on. > Not sure if this has been reported or not (w/CEDAR chipset) but FYI. Make sure you have the the latest CEDAR_rlc.bin in addition to CYPRESS_uvd.bin and make sure the latest images are available in your initrd or kernel image, etc. Alex -- Hi, Even with the latest (confirmed below) and built-in to the kernel: $ git clone git://git.kernel.org/pub/scm/linux/kernel/git/dwmw2/linux-firmware.git Cloning into 'linux-firmware'... remote: Counting objects: 2210, done. remote: Compressing objects: 100% (1102/1102), done. remote: Total 2210 (delta 1123), reused 2113 (delta 1073) Receiving objects: 100% (2210/2210), 39.84 MiB | 6.49 MiB/s, done. Resolving deltas: 100% (1123/1123), done. $ date Tue Jul 2 14:39:19 EDT 2013 2b244d41832f46382bfbb8994522dcdd d/linux-firmware/radeon/CEDAR_me.bin 23915e382ea0d2f2491a19146ca3001c d/linux-firmware/radeon/CEDAR_pfp.bin e8770d3d588f24dc6f1a8609c9db3467 d/linux-firmware/radeon/CEDAR_rlc.bin fb23b281dcc94a035d374e709c9842bd d/linux-firmware/radeon/CYPRESS_uvd.bin Current system firmware (/lib/firmware/radeon): $ for i in $CONFIG_EXTRA_FIRMWARE; do md5sum $i; done 2b244d41832f46382bfbb8994522dcdd radeon/CEDAR_me.bin 23915e382ea0d2f2491a19146ca3001c radeon/CEDAR_pfp.bin e8770d3d588f24dc6f1a8609c9db3467 radeon/CEDAR_rlc.bin fb23b281dcc94a035d374e709c9842bd radeon/CYPRESS_uvd.bin CONFIG_EXTRA_FIRMWARE="radeon/CEDAR_me.bin radeon/CEDAR_pfp.bin radeon/CEDAR_rlc.bin radeon/CYPRESS_uvd.bin" Same issue. Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
3.10: NCT6776F sensor question with Supermicro X9SRL-F motherboard
Hello, Currently running 3.10 with: CONFIG_SENSORS_NCT6775=y Motherboard: Supermicro X9SRL-F A couple questions: 1) Was curious if the PCH CHIP/CPU/MCH temperatures should be populated for this board? 2) Additionally, why is the CPUTIN in alarm? I also found: http://www.lm-sensors.org/wiki/Configurations/SuperMicro/X9SRA Does Super Micro also have such a config file for the X9SRL-F? This board uses a NCT6776F. Relevant output from lm_sensors 3.6.0+dfsg1-1: nct6776-isa-0a30 Adapter: ISA adapter Vcore: +0.81 V (min = +0.54 V, max = +1.49 V) in1:+1.85 V (min = +1.62 V, max = +1.99 V) AVCC: +3.30 V (min = +2.98 V, max = +3.63 V) +3.3V: +3.30 V (min = +2.98 V, max = +3.63 V) in4:+1.51 V (min = +1.35 V, max = +1.65 V) in5:+1.27 V (min = +1.13 V, max = +1.38 V) in6:+1.06 V (min = +0.92 V, max = +1.34 V) 3VSB: +3.57 V (min = +2.98 V, max = +3.63 V) Vbat: +3.49 V (min = +2.70 V, max = +3.63 V) fan1: 986 RPM (min = 700 RPM) fan2: 1322 RPM (min = 700 RPM) fan3: 1103 RPM (min = 700 RPM) fan4: 1080 RPM (min = 700 RPM) fan5: 1001 RPM (min = 700 RPM) SYSTIN: +42.0 C (high = +75.0 C, hyst = +70.0 C) sensor = thermistor CPUTIN: +33.0 C (high = +95.0 C, hyst = +92.0 C) ALARM sensor = thermistor AUXTIN: +23.0 C (high = +80.0 C, hyst = +75.0 C) sensor = thermistor PECI Agent 0:+0.0 C (high = +95.0 C, hyst = +92.0 C) (crit = +100.0 C) PCH_CHIP_TEMP: +0.0 C PCH_CPU_TEMP:+0.0 C PCH_MCH_TEMP:+0.0 C intrusion0:ALARM intrusion1:ALARM sensors3.conf snippet: chip "w83627ehf-*" "w83627dhg-*" "w83667hg-*" "nct6775-*" "nct6776-*" label in0 "Vcore" label in2 "AVCC" label in3 "+3.3V" label in7 "3VSB" label in8 "Vbat" set in2_min 3.3 * 0.90 set in2_max 3.3 * 1.10 set in3_min 3.3 * 0.90 set in3_max 3.3 * 1.10 set in7_min 3.3 * 0.90 set in7_max 3.3 * 1.10 set in8_min 3.0 * 0.90 set in8_max 3.3 * 1.10 Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
3.10: Intel HWMON/NIC temperature sensor question
Hi, I saw this in the device drive section and was curious which Intel-based NICs contain temperature sensors? Intel(R) 10GbE PCI Express adapters HWMON support Intel(R) PCI-Express Gigabit adapters HWMON support I checked the boards below and none appear to expose a hwmon interface: 08:00.0 Ethernet controller: Intel Corporation 82580 Gigabit Network Connection (rev 01) 08:00.1 Ethernet controller: Intel Corporation 82580 Gigabit Network Connection (rev 01) 08:00.2 Ethernet controller: Intel Corporation 82580 Gigabit Network Connection (rev 01) 08:00.3 Ethernet controller: Intel Corporation 82580 Gigabit Network Connection (rev 01) 09:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection 0a:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection 01:00.0 Ethernet controller: Intel Corporation 82598EB 10-Gigabit AT2 Server Adapter (rev 01) # find /sys/|grep hwmon Radeon: /sys/devices/pci:00/:00:03.1/:05:00.0/hwmon /sys/devices/pci:00/:00:03.1/:05:00.0/hwmon/hwmon0 /sys/devices/pci:00/:00:03.1/:05:00.0/hwmon/hwmon0/name /sys/devices/pci:00/:00:03.1/:05:00.0/hwmon/hwmon0/power /sys/devices/pci:00/:00:03.1/:05:00.0/hwmon/hwmon0/power/control /sys/devices/pci:00/:00:03.1/:05:00.0/hwmon/hwmon0/power/runtime _active_time /sys/devices/pci:00/:00:03.1/:05:00.0/hwmon/hwmon0/power/autosus pend_delay_ms /sys/devices/pci:00/:00:03.1/:05:00.0/hwmon/hwmon0/power/runtime _status /sys/devices/pci:00/:00:03.1/:05:00.0/hwmon/hwmon0/power/runtime _suspended_time /sys/devices/pci:00/:00:03.1/:05:00.0/hwmon/hwmon0/device /sys/devices/pci:00/:00:03.1/:05:00.0/hwmon/hwmon0/subsystem /sys/devices/pci:00/:00:03.1/:05:00.0/hwmon/hwmon0/uevent /sys/devices/pci:00/:00:03.1/:05:00.0/hwmon/hwmon0/temp1_input Onboard chipset: /sys/devices/platform/nct6775.2608/hwmon /sys/devices/platform/nct6775.2608/hwmon/hwmon2 /sys/devices/platform/nct6775.2608/hwmon/hwmon2/power /sys/devices/platform/nct6775.2608/hwmon/hwmon2/power/control /sys/devices/platform/nct6775.2608/hwmon/hwmon2/power/runtime_active_time /sys/devices/platform/nct6775.2608/hwmon/hwmon2/power/autosuspend_delay_ms /sys/devices/platform/nct6775.2608/hwmon/hwmon2/power/runtime_status /sys/devices/platform/nct6775.2608/hwmon/hwmon2/power/runtime_suspended_time /sys/devices/platform/nct6775.2608/hwmon/hwmon2/device /sys/devices/platform/nct6775.2608/hwmon/hwmon2/subsystem /sys/devices/platform/nct6775.2608/hwmon/hwmon2/uevent CPU: /sys/devices/platform/coretemp.0/hwmon /sys/devices/platform/coretemp.0/hwmon/hwmon1 /sys/devices/platform/coretemp.0/hwmon/hwmon1/power /sys/devices/platform/coretemp.0/hwmon/hwmon1/power/control /sys/devices/platform/coretemp.0/hwmon/hwmon1/power/runtime_active_time /sys/devices/platform/coretemp.0/hwmon/hwmon1/power/autosuspend_delay_ms /sys/devices/platform/coretemp.0/hwmon/hwmon1/power/runtime_status /sys/devices/platform/coretemp.0/hwmon/hwmon1/power/runtime_suspended_time /sys/devices/platform/coretemp.0/hwmon/hwmon1/device /sys/devices/platform/coretemp.0/hwmon/hwmon1/subsystem /sys/devices/platform/coretemp.0/hwmon/hwmon1/uevent Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: 3.10 kernel: [drm:evergreen_startup] *ERROR* radeon: error initializing UVD (-1). [SOLVED]
-Original Message- From: Alex Deucher [mailto:alexdeuc...@gmail.com] Sent: Tuesday, July 02, 2013 3:41 PM To: Justin Piszcz Cc: linux-kernel@vger.kernel.org; dri-de...@lists.freedesktop.org Subject: Re: 3.10 kernel: [drm:evergreen_startup] *ERROR* radeon: error initializing UVD (-1). Please open a bug (product: DRI, component: DRM/Radeon): https://bugs.freedesktop.org and attach your dmesg output and xorg log. Alex Per: https://bugs.freedesktop.org/show_bug.cgi?id=66519 Grabbed firmware, removed distribution firmware package, re-built kernel and all is working now, thanks! Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [lm-sensors] 3.10: NCT6776F sensor question with Supermicro X9SRL-F motherboard
-Original Message- From: Guenter Roeck [mailto:li...@roeck-us.net] Sent: Wednesday, July 03, 2013 10:42 AM To: Justin Piszcz Cc: linux-kernel@vger.kernel.org; lm-sens...@lm-sensors.org Subject: Re: [lm-sensors] 3.10: NCT6776F sensor question with Supermicro X9SRL-F motherboard [ .. ] This is surprising and might be where the alarm comes from. What output do you get if you load the coretemp driver ? coretemp-isa- Adapter: ISA adapter Physical id 0: +38.0 C (high = +81.0 C, crit = +91.0 C) Core 0: +36.0 C (high = +81.0 C, crit = +91.0 C) Core 1: +35.0 C (high = +81.0 C, crit = +91.0 C) Core 2: +35.0 C (high = +81.0 C, crit = +91.0 C) Core 3: +34.0 C (high = +81.0 C, crit = +91.0 C) Core 4: +37.0 C (high = +81.0 C, crit = +91.0 C) Core 5: +38.0 C (high = +81.0 C, crit = +91.0 C) > PCH_CHIP_TEMP: +0.0 C > PCH_CPU_TEMP:+0.0 C > PCH_MCH_TEMP:+0.0 C > intrusion0:ALARM > intrusion1:ALARM Are those not connected ? The intrusion headers are not connected, also, I have not dug into it but when you try to ignore the PECI or those PCH* lm_sensors seems to ignore the rule. sensors3.conf: ignore PCH_CHIP_TEMP ignore PCH_CPU_TEMP ignore PCH_MCH_TEMP $ sensors |tail -n 4 PCH_CHIP_TEMP: +0.0 C PCH_CPU_TEMP:+0.0 C PCH_MCH_TEMP:+0.0 C Ignoring intrusion works though: ignore intrusion0 ignore intrusion1 nct6776-isa-0a30 Adapter: ISA adapter Vcore: +0.84 V (min = +0.54 V, max = +1.49 V) in1:+1.84 V (min = +1.62 V, max = +1.99 V) AVCC: +3.28 V (min = +2.98 V, max = +3.63 V) +3.3V: +3.28 V (min = +2.98 V, max = +3.63 V) in4:+1.50 V (min = +1.35 V, max = +1.65 V) in5:+1.26 V (min = +1.13 V, max = +1.38 V) in6:+1.06 V (min = +0.92 V, max = +1.34 V) 3VSB: +3.57 V (min = +2.98 V, max = +3.63 V) Vbat: +3.47 V (min = +2.70 V, max = +3.63 V) fan1: 1007 RPM (min = 700 RPM) fan2: 1317 RPM (min = 700 RPM) fan3: 1102 RPM (min = 700 RPM) fan4: 1059 RPM (min = 700 RPM) fan5: 998 RPM (min = 700 RPM) SYSTIN: +40.0 C (high = +75.0 C, hyst = +70.0 C) sensor = thermistor CPUTIN: +31.5 C (high = +95.0 C, hyst = +92.0 C) ALARM sensor = thermistor AUXTIN: +23.0 C (high = +80.0 C, hyst = +75.0 C) sensor = thermistor PECI Agent 0:+0.0 C (high = +95.0 C, hyst = +92.0 C) (crit = +100.0 C) PCH_CHIP_TEMP: +0.0 C PCH_CPU_TEMP:+0.0 C PCH_MCH_TEMP:+0.0 C Thanks, Guenter -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [lm-sensors] 3.10: NCT6776F sensor question with Supermicro X9SRL-F motherboard
-Original Message- From: Guenter Roeck [mailto:li...@roeck-us.net] Sent: Wednesday, July 03, 2013 12:33 PM To: Justin Piszcz Cc: linux-kernel@vger.kernel.org; lm-sens...@lm-sensors.org Subject: Re: [lm-sensors] 3.10: NCT6776F sensor question with Supermicro X9SRL-F motherboard [ .. ] > Can you install superiotool and run "sudo superiotool -V -e" ? I would like to see raw data from the superio chip. # superiotool -V -e superiotool r6637 .. Probing for Nuvoton Super I/O (sid=0xfc) at 0x164e... Found Nuvoton WPCM450 (id=0x1a11, rev=0x00) at 0x164e Probing for Nuvoton Super I/O at 0x2e... Found Nuvoton NCT6776F (C) (id=0xc333) at 0x2e [ .. ] You have to specify the raw attribute names, not the symbolic ones. You see the raw attribute names with "sensors -u". # sensors -u PCH_CHIP_TEMP: temp8_input: 0.000 PCH_CPU_TEMP: temp9_input: 0.000 PCH_MCH_TEMP: temp10_input: 0.000 Tried: ignore PCH_CHIP_TEMP ignore temp8_input For some reason still can't get rid of it appearing: PCH_CHIP_TEMP: +0.0 C (..as well as the others) > $ sensors |tail -n 4 > PCH_CHIP_TEMP: +0.0 C > PCH_CPU_TEMP:+0.0 C > PCH_MCH_TEMP:+0.0 C > > Ignoring intrusion works though: > ignore intrusion0 > ignore intrusion1 > Does the board have intrusion detection headers ? If so, you could close (bridge) the header(s) which should get rid of the alarm. ftp://ftp.supermicro.com/CDR-X9-UP_1.21_for_Intel_X9_UP_platform/MANUALS/X9S RL-F/X9SRL-F.pdf @ page 49 (2-23) [motherboard: JL1 2-pin header] - -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [lm-sensors] 3.10: NCT6776F sensor question with Supermicro X9SRL-F motherboard
-Original Message- From: Guenter Roeck [mailto:li...@roeck-us.net] Sent: Wednesday, July 03, 2013 1:17 PM To: Justin Piszcz Cc: linux-kernel@vger.kernel.org; lm-sens...@lm-sensors.org Subject: Re: [lm-sensors] 3.10: NCT6776F sensor question with Supermicro X9SRL-F motherboard [ .. ] Can you try http://roeck-us.net/linux/bin/superiotool ? Sure, below: $ sudo ./superiotool superiotool r4.0-2514-gf419483 Found Nuvoton WPCM450 (id=0x1a11, rev=0x00) at 0x164e Found Nuvoton NCT6776F (C) (id=0xc333) at 0x2e $ sudo ./superiotool -V -e Probing for Nuvoton Super I/O (sid=0xfc) at 0x164e... Found Nuvoton WPCM450 (id=0x1a11, rev=0x00) at 0x164e Probing for Nuvoton Super I/O at 0x2e... Found Nuvoton NCT6776F (C) (id=0xc333) at 0x2e Bank 0: 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 00: 03 33 03 33 00 ff ff ff ff ff ff ff ff ff ff ff 01: 04 ff 00 00 00 00 ff ff 00 00 00 00 00 83 00 00 02: 69 e6 ce ce bc 9e 84 2b ff ff ff ba 43 f9 cb e3 03: ba e3 ba ce a9 ac 8d a8 73 4b 46 ff ff ff ff ff 04: 03 fe 58 ff ff 80 3f ff 2d ff ff ff 10 05 00 a3 05: ff ff ff ff ff ff ff ff c1 ff ff ff ff 01 00 ff 06: 00 ff ff ff ff 01 07 ff ff ff ff ff ff ff ff ff 07: 00 0a 00 21 00 00 00 17 00 ff ff ff ff ff ff ff 08: ff 03 1f 0f ff 3c 3c 3c 00 00 00 00 00 00 00 00 09: 0a 00 00 00 00 0a 0a 0a 0a aa ef 80 ff 40 46 c4 0a: 0e 01 00 00 ff 00 00 ff 00 00 80 66 66 06 01 01 0b: 00 00 00 00 02 01 35 00 1c 00 00 04 38 c0 c4 ff 0c: 01 00 00 00 00 00 00 00 00 07 02 ff ff ff ff ff 0d: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0e: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0f: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff Bank 1: 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 00: 82 00 00 01 01 28 01 3c ff 33 ff ff 00 ff ff ff 01: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 02: 01 23 28 37 37 ff ff 28 51 ff ff ff ff ff ff ff 03: ff 00 ff ff 00 55 ff ff 05 01 00 00 00 00 00 00 04: ff ff ff ff ff d9 03 ff ff ff ff ff ff ff 01 ff 05: 00 00 00 5c 00 5f 00 ff ff ff ff ff ff ff ff ff 06: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 07: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 08: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 09: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0a: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0b: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0c: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0d: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0e: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0f: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff Bank 2: 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 00: 8c 32 00 01 01 33 01 3c ff 33 ff ff 00 ff ff ff 01: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 02: 01 32 5a 5a 5a ff ff 33 ff ff ff ff ff ff ff ff 03: ff 00 ff ff 00 5f ff ff 03 01 00 00 00 00 00 00 04: ff ff ff ff ff 2c 05 ff ff ff ff ff ff ff 02 ff 05: 21 00 00 5c 00 5f 00 ff ff ff ff ff ff ff ff ff 06: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 07: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 08: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 09: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0a: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0b: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0c: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0d: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0e: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0f: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff Bank 3: 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 00: 03 00 00 0a 0a 01 01 3c ff ff ff ff 00 ff ff ff 01: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 02: 00 19 23 2d 37 ff ff 8c aa c8 e6 ff ff ff ff ff 03: ff 00 ff ff 00 3c ff ff 00 01 00 00 00 00 00 00 04: ff ff ff ff ff 48 04 ff ff ff ff ff ff ff 03 ff 05: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 06: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 07: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 08: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 09: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0a: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0b: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0c: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0d: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0e: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0f: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff Bank 4: 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01: 00 00 00 00 00 00 00 00 00 00 00 ff ff ff ff ff 02: ff ff ff 96 64 96 64 e1 96 ff ff ff ff ff ff ff 03: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 04: 3f 00 03 ff ff ff ff ff ff ff ff ff ff ff 04 ff 05: 31 13 ff ff 00 00 00 ff 00 20 12 00 09 ff ff ff 06: ff 01 00 00 00 00 00 00 ff ff ff ff ff ff ff ff 07: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 08: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 09: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0a: ff ff ff ff ff ff ff
RE: [lm-sensors] 3.10: NCT6776F sensor question with Supermicro X9SRL-F motherboard
Re-sending as text. From: Justin Piszcz [mailto:jpis...@lucidpixels.com] Sent: Wednesday, July 03, 2013 5:00 PM To: 'Guenter Roeck' Cc: linux-kernel@vger.kernel.org; lm-sens...@lm-sensors.org Subject: RE: [lm-sensors] 3.10: NCT6776F sensor question with Supermicro X9SRL-F motherboard -Original Message- From: Guenter Roeck [mailto:li...@roeck-us.net] Sent: Wednesday, July 03, 2013 4:57 PM To: Justin Piszcz Cc: linux-kernel@vger.kernel.org; lm-sens...@lm-sensors.org Subject: Re: [lm-sensors] 3.10: NCT6776F sensor question with Supermicro X9SRL-F motherboard [ .. ] > 09: 0a 00 00 00 00 0a 0a 0a 0a aa ef 80 ff 40 46 c4 > 0a: 0e 01 00 00 ff 00 00 ff 00 00 80 66 66 06 01 01 ^^ This shows that PECI Agent 0 is supposed to be enabled. > Bank 2: > 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f > 00: 8c 32 00 01 01 33 01 3c ff 33 ff ff 00 ff ff ff ^^ This value suggests that the second temperature sensor (the one creating the alarm) is supposed to be the PECI source (which reports the CPU temperature to the NCT6776), and that it is supposed to be used to control the speed of the CPU fan. ^^ Fan control is in manual mode. Did you set this ? It is quite unusual. Setting: Current FAN Mode is Optimal. Set Fan to Standard Speed Set Fan to Full Speed Set Fan to Optimal Speed > Bank 7: > 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f > 00: ff 95 02 10 00 00 00 00 00 64 00 00 00 00 00 00 > 01: 00 00 00 00 00 00 00 f8 80 f8 80 f8 80 f8 80 00 ^^ 00 here shows that the PECI source is not active, ie the CPU does not deliver PECI data to the NCT6776. This explains the alarm. Practical impact is probably limited as fan control is configured to be manual anyway, but I wonder why PECI doesn't work on your board. PECI configuration is identical to my Supermicro board. Guenter -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
3.9.8: cannot boot with iommu=on using 6gbps highpoint sata card
Hello, This bug I reported in November 2012: http://www.kernelhub.org/?p=2&msg=170797 http://home.comcast.net/~jpiszcz/20121128/highpoint.jpg Some discussion on the patches: http://www.kernelhub.org/?p=2&msg=172597 http://www.kernelhub.org/?p=2&msg=171846 http://www.kernelhub.org/?p=2&msg=172346 My current choices: 1. Stick with old kernel (3.7.x) + patch, which works w/IOMMU=on and use 6gbps SATA board. 2. Put drive back on motherboard SATA2 and the machine will boot properly with any kernel. 3. I would prefer to use 6gbps if possible. There is still a problem with 3.9.8: (below) Rebooting to new kernel (3.9.8) from the 3.7.x kernel w/ patch provided earlier: Jun 28 07:48:54 p34 [ 5809.227191] EXT4-fs (sdc2): re-mounted. Opts: (null) Jun 28 07:48:56 p34 [ 5811.283882] sd 7:0:0:0: [sdc] Synchronizing SCSI cache Jun 28 07:48:56 p34 [ 5811.284402] sd 0:0:1:0: [sdb] Synchronizing SCSI cache Jun 28 07:48:56 p34 [ 5811.284522] sd 0:0:0:0: [sda] Synchronizing SCSI cache Jun 28 07:48:56 p34 [ 5811.284911] 3w-sas: Shutting down host 0. Jun 28 07:49:03 p34 [ 5818.107989] 3w-sas: Shutdown complete. Jun 28 07:50:50 p34 [6.127384] usb 6-2.1: new full-speed USB device number 4 using uhci_hcd Jun 28 07:50:50 p34 [6.248468] usb 6-2.1: New USB device found, idVendor=413c, idProduct=2002 Jun 28 07:50:50 p34 [6.249011] usb 6-2.1: New USB device strings: Mfr=1, Product=2, SerialNumber=0 Jun 28 07:50:50 p34 [6.250654] usb 6-2.1: Product: Dell USB Keyboard Hub Jun 28 07:50:50 p34 [6.252476] usb 6-2.1: Manufacturer: Dell Jun 28 07:50:50 p34 [6.264087] input: Dell Dell USB Keyboard Hub as /devices/pci:00/:00:1d.0/usb6/6-2/6-2.1/6-2.1:1.0/input/input5 Jun 28 07:50:50 p34 [6.265041] hid-generic 0003:413C:2002.0004: input,hidraw3: USB HID v1.10 Keyboard [Dell Dell USB Keyboard Hub] on usb-:00:1d.0-2.1/input0 Jun 28 07:50:50 p34 [6.274732] input: Dell Dell USB Keyboard Hub as /devices/pci:00/:00:1d.0/usb6/6-2/6-2.1/6-2.1:1.1/input/input6 Jun 28 07:50:50 p34 [6.275752] hid-generic 0003:413C:2002.0005: input,hidraw4: USB HID v1.10 Device [Dell Dell USB Keyboard Hub] on usb-:00:1d.0-2.1/input1 Jun 28 07:50:50 p34 [6.343561] usb 6-2.3: new low-speed USB device number 5 using uhci_hcd Jun 28 07:50:50 p34 [6.476670] usb 6-2.3: New USB device found, idVendor=045e, idProduct=0040 Jun 28 07:50:50 p34 [6.477220] usb 6-2.3: New USB device strings: Mfr=1, Product=3, SerialNumber=0 Jun 28 07:50:50 p34 [6.479064] usb 6-2.3: Product: Microsoft 3-Button Mouse with IntelliEye(TM) Jun 28 07:50:50 p34 [6.480944] usb 6-2.3: Manufacturer: Microsoft Jun 28 07:50:50 p34 [6.500368] input: Microsoft Microsoft 3-Button Mouse with IntelliEye(TM) as /devices/pci:00/:00:1d.0/usb6/6-2/6-2.3/6-2.3:1.0/input/input7 Jun 28 07:50:50 p34 [6.501539] hid-generic 0003:045E:0040.0006: input,hidraw5: USB HID v1.10 Mouse [Microsoft Microsoft 3-Button Mouse with IntelliEye(TM)] on usb-:00:1d.0-2.3/input0 Jun 28 07:50:52 p34 [7.840750] ata14.00: qc timeout (cmd 0xa1) Jun 28 07:50:52 p34 [7.848753] ata7.00: qc timeout (cmd 0xec) Jun 28 07:50:52 p34 [7.849304] ata7.00: failed to IDENTIFY (I/O error, err_mask=0x4) Jun 28 07:50:52 p34 [8.156029] ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Jun 28 07:50:52 p34 [8.341165] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4) Jun 28 07:50:53 p34 [9.146828] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Jun 28 07:51:02 p34 [ 18.164175] ata7.00: qc timeout (cmd 0xec) Jun 28 07:51:02 p34 [ 18.164740] ata7.00: failed to IDENTIFY (I/O error, err_mask=0x4) Jun 28 07:51:02 p34 [ 18.166487] ata7: limiting SATA link speed to 3.0 Gbps Jun 28 07:51:02 p34 [ 18.473437] ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 320) Jun 28 07:51:03 p34 [ 19.154971] ata14.00: qc timeout (cmd 0xa1) Jun 28 07:51:04 p34 [ 19.655389] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4) Jun 28 07:51:04 p34 [ 19.656004] ata14: limiting SATA link speed to 1.5 Gbps Jun 28 07:51:04 p34 [ 20.463046] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Jun 28 07:51:32 p34 [ 48.497977] ata7.00: qc timeout (cmd 0xec) Jun 28 07:51:32 p34 [ 48.498547] ata7.00: failed to IDENTIFY (I/O error, err_mask=0x4) Jun 28 07:51:33 p34 [ 48.805177] ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 320) Jun 28 07:51:34 p34 [ 50.487585] ata14.00: qc timeout (cmd 0xa1) Jun 28 07:51:35 p34 [ 50.987948] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4) Jun 28 07:51:36 p34 [ 51.793613] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Jun 28 07:51:36 p34 [ 52.294202] VFS: Cannot open root device "822" or unknown-block(8,34): error -6 Jun 28 07:51:36 p34 [ 52.294770] Please append a correct "root=" boot option; here are the available partitions: Jun 28 07:51:36 p34 [ 52.296927] 0800 29296762880 sda Jun 28 07:51:36 p34 driver: sd Jun 28 07:5
Re: 3.10: discard/trim support on md-raid1?
Thanks for the replies, After some further testing.. When I ran a repair on the md's sync_action, the system would reduce I/O to the RAID-1 to 14kb/s or even less when it hit a certain number of blocks and effectively locked the system every time. It turned out to be a bad SSD (it also failed Intel's Secure Erase), I RMA'd it. Interesting though that it did not drop out of the array but froze the system (the failure scenario was odd). Justin. On Tue, Jul 16, 2013 at 3:15 AM, NeilBrown wrote: > On Sat, 13 Jul 2013 06:34:19 -0400 "Justin Piszcz" > wrote: > >> Hello, >> >> Running 3.10 and I see the following for an md-raid1 of two SSDs: >> >> Checking /sys/block/md1/queue: >> add_random: 0 >> discard_granularity: 512 >> discard_max_bytes: 2147450880 >> discard_zeroes_data: 0 >> hw_sector_size: 512 >> iostats: 0 >> logical_block_size: 512 >> max_hw_sectors_kb: 32767 >> max_integrity_segments: 0 >> max_sectors_kb: 512 >> max_segment_size: 65536 >> max_segments: 168 >> minimum_io_size: 512 >> nomerges: 0 >> nr_requests: 128 >> optimal_io_size: 0 >> physical_block_size: 512 >> read_ahead_kb: 8192 >> rotational: 1 >> rq_affinity: 0 >> scheduler: none >> write_same_max_bytes: 0 >> >> What should be seen: >> rotational: 0 > > What has "rotational" got to do with "supports discard"? > There may be some correlation, but it isn't causal. > >> And possibly: >> discard_zeroes_data: 1 > > This should be set as the 'or' of the same value from component devices. And > does not enable or disable the use of discard. > > I don't think that "does this device support discard" appears in sysfs. > > I believe trim does work on md/raid1 if the underlying devices all support it. > > NeilBrown > > > >> >> Can anyone confirm if there is a workaround to allow TRIM when using >> md-raid1? >> >> Some related discussion here: >> http://us.generation-nt.com/answer/md-rotational-attribute-help-206571222.ht >> ml >> http://www.progtown.com/topic343938-ssd-strange-itself-conducts.html >> >> >> Justin. >> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: 3.10.1: echo repair > sync_action causes hang on RAID-1 (2 x SSD)
-Original Message- From: NeilBrown [mailto:ne...@suse.de] Sent: Monday, July 29, 2013 1:57 AM To: Justin Piszcz Cc: linux-kernel@vger.kernel.org; linux-r...@vger.kernel.org Subject: Re: 3.10.1: echo repair > sync_action causes hang on RAID-1 (2 x SSD) On Fri, 26 Jul 2013 05:56:51 -0400 "Justin Piszcz" wrote: [..] Further testing shows all is ok now: Sun Nov 25 02:12:03 EST 2012: Parity check(s) running, sleeping 60 seconds... Sun Nov 25 02:13:03 EST 2012: Parity check(s) running, sleeping 60 seconds... Sun Nov 25 02:14:03 EST 2012: cat /sys/block/md0/md/mismatch_cnt Sun Nov 25 02:14:03 EST 2012: 0 Sun Nov 25 02:14:03 EST 2012: cat /sys/block/md1/md/mismatch_cnt Sun Nov 25 02:14:03 EST 2012: 0 Sun Nov 25 02:14:03 EST 2012: The meta-device /dev/md0 has no mismatched sectors. Sun Nov 25 02:14:04 EST 2012: The meta-device /dev/md1 has no mismatched sectors. Sun Nov 25 02:14:05 EST 2012: All devices are clean... Sun Nov 25 02:14:05 EST 2012: cat /sys/block/md0/md/mismatch_cnt Sun Nov 25 02:14:05 EST 2012: 0 Sun Nov 25 02:14:05 EST 2012: cat /sys/block/md1/md/mismatch_cnt Sun Nov 25 02:14:05 EST 2012: 0 Thanks for your help. Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
3.10.1: echo repair > sync_action causes hang on RAID-1 (2 x SSD)
Hi, When I run repair on an MD-RAID1 sync_action, the speed slows down and it stays like this (below) for hours. The system is then completely unresponsive to user input. I have replaced a failing SSD; however, after a check, mismatch_cnt seems to increase over time. When I run repair, the system freezes to user-input. Has anyone else run into this issue with a RAID-1 volume (2 x SSD) using 0.90 metadata? Long ago I used to use this same configuration with two physical disks and there was never a problem. Even though I left a root shell open, this has no effect to break the resync: # echo idle > /sys/devices/virtual/block/md1/md/sync_action Every 1.0s: cat /proc/mdstatSun Jul 21 06:15:38 2013 Personalities : [raid1] md1 : active raid1 sdc2[0] sdb2[1] 233381376 blocks [2/2] [UU] [>] resync = 0.0% (151616/233381376) finish=36171.5min speed=107K/sec md0 : active raid1 sdc1[0] sdb1[1] 1048512 blocks [2/2] [UU] unused devices: 10 minutes later: 233381376 blocks [2/2] [UU] [>] resync = 0.0% (151616/233381376) finish=52219.3min speed=74K/sec Where it hangs (151616) or elsewhere, has been different each time I watched it, it does not appear to be hanging at the same block each time. Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: 3.10.1: echo repair > sync_action causes hang on RAID-1 (2 x SSD)
-Original Message- From: NeilBrown [mailto:ne...@suse.de] Sent: Sunday, July 21, 2013 7:03 PM To: Justin Piszcz Cc: linux-kernel@vger.kernel.org; linux-r...@vger.kernel.org Subject: Re: 3.10.1: echo repair > sync_action causes hang on RAID-1 (2 x SSD) > Hi Justin, > this is a known bug. Fix has been accepted into mainline for 3.11-rc2. > Hopefully it will get into 3.10.3 (too late for 3.10.2). > NeilBrown Hi Neil, Did the fix by chance make it into 3.10.3? The same issue occurs with 3.10.3 for me as well: Every 1.0s: cat /proc/mdstatThu Jul 25 19:09:46 2013 Personalities : [raid1] md1 : active raid1 sdc2[0] sdb2[1] 233381376 blocks [2/2] [UU] [>] resync = 0.0% (151488/233381376) finish=32045.3m in speed=121K/sec md0 : active raid1 sdc1[0] sdb1[1] 1048512 blocks [2/2] [UU] unused devices: Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: 3.10.1: echo repair > sync_action causes hang on RAID-1 (2 x SSD)
-Original Message- From: NeilBrown [mailto:ne...@suse.de] Sent: Thursday, July 25, 2013 8:36 PM To: Justin Piszcz Cc: linux-kernel@vger.kernel.org; linux-r...@vger.kernel.org Subject: Re: 3.10.1: echo repair > sync_action causes hang on RAID-1 (2 x SSD) On Thu, 25 Jul 2013 19:10:50 -0400 "Justin Piszcz" wrote: > Did the fix by chance make it into 3.10.3? No, it looks like it missed again. I gather there was a large inflow of patches for -stable in the 3.11-rc1 merge window and Greg has been processing them in batches. Hopefully in 3.10.4. The relevant patch is commit 30bc9b53878a9921b02e3 in mainline. NeilBrown -- Method to get patch via git and patch kernel: $ git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git $ git log |grep 30bc9b53878a9921b02e3 commit 30bc9b53878a9921b02e3b5bc4283ac1c6de102a $ git show 30bc9b53878a9921b02e3b5bc4283ac1c6de102a > /tmp/a # patch -p1 < /tmp/a patching file drivers/md/raid1.c Hunk #1 succeeded at 1848 (offset -1 lines). Hunk #2 succeeded at 1886 (offset -1 lines). Hunk #3 succeeded at 1915 (offset -1 lines). Reboot- tested, success, thanks..! One follow-up question: $ cat /sys/block/md1/md/mismatch_cnt 314112 -> On a live RAID-1 (root filesystem) without swap, is it normal to have such a high mismatch_cnt even after a repair? First repair: Fri Jul 26 05:30:47 EDT 2013: The meta-device /dev/md1 has mismatch_cnt 314112 sectors. Second repair: Fri Jul 26 05:30:47 EDT 2013: The meta-device /dev/md1 has mismatch_cnt 313600 sectors. Should I be concerned? Testing the patch: Personalities : [raid1] md1 : active raid1 sdc2[0] sdb2[1] 233381376 blocks [2/2] [UU] [>] check = 0.3% (838976/233381376) finish=9.2min speed=419488K/sec md0 : active raid1 sdc1[0] sdb1[1] 1048512 blocks [2/2] [UU] Personalities : [raid1] md1 : active raid1 sdc2[0] sdb2[1] 233381376 blocks [2/2] [UU] [===>.] check = 77.5% (180889856/233381376) finish=2.5min speed=342654K/sec md0 : active raid1 sdc1[0] sdb1[1] 1048512 blocks [2/2] [UU] Personalities : [raid1] md1 : active raid1 sdc2[0] sdb2[1] 233381376 blocks [2/2] [UU] md0 : active raid1 sdc1[0] sdb1[1] 1048512 blocks [2/2] [UU] Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] Quirk to support Marvell 88SE91xx SATA controllers with Intel IOMMU.
-Original Message- From: Andrew Cooks [mailto:aco...@gmail.com] Sent: Friday, March 01, 2013 3:26 AM To: aco...@gmail.com; j...@8bytes.org; xjtuy...@hotmail.com; gm.y...@gmail.com; alex.william...@redhat.com; bhelg...@google.com; jpis...@lucidpixels.com; dw...@infradead.org Cc: open list:INTEL IOMMU (VT-d); open list; open list:PCI SUBSYSTEM Subject: [PATCH] Quirk to support Marvell 88SE91xx SATA controllers with Intel IOMMU. This is my third submitted patch to make Marvell 88SE91xx SATA controllers work when IOMMU is enabled.[1][2] What's changed: * Adopt David Woodhouse's terminology by referring to the quirky functions as 'ghost' functions. * Unmap ghost functions when device is detached from IOMMU. * Stub function for when CONFIG_PCI_QUIRKS is not enabled. The bad: * Still no AMD support. * The table of affected chip IDs is as complete as I can make it by googling for bug reports. This patch was generated against commit b0af9cd9aab60ceb17d3ebabb9fdf4ff0a99cf50, but will also apply cleanly to 3.7.10. -- Hi, Against 3.7.10: # patch -p1 < ../RFC-Fix-Intel-IOMMU-support-for-Marvell-88SE91xx-SATA-controllers..patch patching file drivers/iommu/intel-iommu.c patching file drivers/pci/quirks.c Hunk #1 succeeded at 3230 (offset 3 lines). patching file include/linux/pci.h # Recompile kernel, reboot.. Shutdown host, re-attach to Marvell Controller w/IOMMU. The host still failed to boot, dmesg/panic here: http://home.comcast.net/~jpiszcz/20130301/boot_failure.JPG (The root disk is /dev/sdc) I recompiled again with IOMMU off and it booted ok: # uname -a Linux host 3.7.10 #2 SMP Fri Mar 1 12:44:25 EST 2013 x86_64 GNU/Linux Here is the part of dmesg (what it looks like when it succeeds with IOMMU=off) [4.288113] input: American Megatrends Inc. Virtual Keyboard and Mouse as /devices/pci:00/:00:1a.1/usb4/4-2/4-2:1.0/input/input3 [4.289025] hid-generic 0003:046B:FF10.0001: input,hidraw0: USB HID v1.10 Keyboard [American Megatrends Inc. Virtual Keyboard and Mouse] on usb-:00:1a.1-2/input0 [4.305993] input: American Megatrends Inc. Virtual Keyboard and Mouse as /devices/pci:00/:00:1a.1/usb4/4-2/4-2:1.1/input/input4 [4.307106] hid-generic 0003:046B:FF10.0002: input,hidraw1: USB HID v1.10 Mouse [American Megatrends Inc. Virtual Keyboard and Mouse] on usb-:00:1a.1-2/input1 [4.326481] ata6: SATA link down (SStatus 0 SControl 300) [4.327324] scsi 7:0:0:0: Direct-Access ATA INTEL SSDSC2MH25 PWG4 PQ: 0 ANSI: 5 [4.329953] sd 7:0:0:0: [sdc] 488397168 512-byte logical blocks: (250 GB/232 GiB) [4.330639] scsi 14:0:0:0: Processor Marvell 91xx Config 1.01 PQ: 0 ANSI: 5 [4.333276] sd 7:0:0:0: [sdc] Write Protect is off [4.334746] sd 7:0:0:0: [sdc] Mode Sense: 00 3a 00 00 [4.334921] sd 7:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [4.345622] sdc: sdc1 sdc2 [4.347493] sd 7:0:0:0: [sdc] Attached SCSI disk Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] Quirk to support Marvell 88SE91xx SATA controllers with Intel IOMMU.
-Original Message- From: Andrew Cooks [mailto:aco...@gmail.com] Sent: Friday, March 01, 2013 5:19 PM To: Justin Piszcz Cc: Joerg Roedel; YingChu; Chu Ying; Alex Williamson; bhelg...@google.com; David Woodhouse; open list:INTEL IOMMU (VT-d); open list; open list:PCI SUBSYSTEM Subject: Re: [PATCH] Quirk to support Marvell 88SE91xx SATA controllers with Intel IOMMU. On Sat, Mar 2, 2013 at 1:51 AM, Justin Piszcz wrote: > > > Thanks for testing! No problem. Against a clean 3.7.10 (from ftp.kernel.org) # patch -p1 < ../patch/RFC-Fix-Intel-IOMMU-support-for-Marvell-88SE91xx-SATA-controllers.. patch patching file drivers/iommu/intel-iommu.c patching file drivers/pci/quirks.c Hunk #1 succeeded at 3230 (offset 3 lines). patching file include/linux/pci.h # pwd /usr/src/linux-3.7.10 Full dmesg with the patch applied: (but with IOMMU off) http://home.comcast.net/~jpiszcz/20130301/dmesg-full.txt Full dmesg (as much as possible through netconsole with IOMMU on) http://home.comcast.net/~jpiszcz/20130301/dmesg-iommu-on.txt Let me know if anything else is needed, thanks. Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] Quirk to support Marvell 88SE91xx SATA controllers with Intel IOMMU.
On Sat, Mar 2, 2013 at 7:18 AM, Justin Piszcz wrote: > > Against a clean 3.7.10 (from ftp.kernel.org) > > # patch -p1 < > ../patch/RFC-Fix-Intel-IOMMU-support-for-Marvell-88SE91xx-SATA-controllers.. > patch > patching file drivers/iommu/intel-iommu.c > patching file drivers/pci/quirks.c > Hunk #1 succeeded at 3230 (offset 3 lines). > patching file include/linux/pci.h > # pwd > /usr/src/linux-3.7.10 > I've downloaded and patched the 3.7.10 tarball and still get the same output I got before; different output from yours. I'm not sure the patch is complete or applying correctly, are you? Could you please check whether the patch you're applying is the same as the attached file? Hi, Success! Patch from e-mail: # md5sum marvell_ghost_funcs.patch 718bfb5876e3538ec23a516ef28d03f5 marvell_ghost_funcs.patch Kernel from ftp.kernel.org: # md5sum linux-3.7.10.tar.bz2 56ec294a922b6112a1ef129668f38a83 linux-3.7.10.tar.bz2 Decompress, patch, re-compile w/IOMMU=on. # tar jxf linux-3.7.10.tar.bz2 ; ln -s linux-3.7.10 linux # cd linux; patch -p1 < ../marvell_ghost_funcs.patch patching file drivers/iommu/intel-iommu.c Hunk #1 succeeded at 1672 (offset -2 lines). Hunk #2 succeeded at 1729 (offset -2 lines). Hunk #3 succeeded at 3833 (offset -2 lines). patching file drivers/pci/quirks.c Hunk #1 succeeded at 3210 (offset -39 lines). Hunk #2 succeeded at 3240 (offset -39 lines). Hunk #3 succeeded at 3258 (offset -39 lines). patching file include/linux/pci.h Hunk #1 succeeded at 1546 (offset -32 lines). Hunk #2 succeeded at 1555 (offset -32 lines). patching file include/linux/pci_ids.h Reboot, re-test. # lilo Added 3.7.7-1 Added 3.7.10-5-ioff Added 3.7.10-7(iommu=off w/patch) = OK Added 3.7.10-8 * (iommu=on w/patch) = OK dmesg w/patch + iommu http://home.comcast.net/~jpiszcz/20130304/dmesg-success-patch.txt Thanks! Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Quirk to support Marvell 88SE91xx SATA controllers with Intel IOMMU.
On Sun, Mar 3, 2013 at 8:35 PM, Andrew Cooks wrote: > On Sat, Mar 2, 2013 at 7:18 AM, Justin Piszcz wrote: >> >> Against a clean 3.7.10 (from ftp.kernel.org) >> >> # patch -p1 < >> ../patch/RFC-Fix-Intel-IOMMU-support-for-Marvell-88SE91xx-SATA-controllers.. >> patch >> patching file drivers/iommu/intel-iommu.c >> patching file drivers/pci/quirks.c >> Hunk #1 succeeded at 3230 (offset 3 lines). >> patching file include/linux/pci.h >> # pwd >> /usr/src/linux-3.7.10 >> > > I've downloaded and patched the 3.7.10 tarball and still get the same > output I got before; different output from yours. I'm not sure the > patch is complete or applying correctly, are you? > Could you please check whether the patch you're applying is the same Hi, As this patch is now working for some time (against 3.7.x), I was wondering when it was going to be included in mainline? I had upgraded to 3.8.x and rebooted and the same problem recurred and had to revert back to 3.7.x. Thanks, Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
3.5.1 kernel: Oops + stracktrace + ext4 kernel errors!
Hello, Thoughts? Saw this when trying to copy files to array with Samba and doing file operations: [28939.505792] [ cut here ] [28939.505818] WARNING: at include/linux/iocontext.h:140 copy_process.part.50+0x115e/0x1220() [28939.505826] Hardware name: X8DTH-i/6/iF/6F [28939.505833] Pid: 16976, comm: dump Not tainted 3.5.1+ #3 [28939.505838] Call Trace: [28939.505847] [] warn_slowpath_common+0x75/0xb0 [28939.505855] [] warn_slowpath_null+0x15/0x20 [28939.505862] [] copy_process.part.50+0x115e/0x1220 [28939.505869] [] do_fork+0x13b/0x2f0 [28939.505880] [] ? recalc_sigpending+0x12/0x30 [28939.505888] [] ? __set_current_blocked+0x3a/0x60 [28939.505898] [] sys_clone+0x23/0x30 [28939.505908] [] stub_clone+0x13/0x20 [28939.505916] [] ? system_call_fastpath+0x1a/0x1f [28939.505922] ---[ end trace bb4eebc57a10f73a ]--- [29113.279716] 3w-sas: scsi0: AEN: INFO (0x04:0x0029): Verify started:unit=0. [29367.345433] BUG: unable to handle kernel NULL pointer dereference at 0028 [29367.345455] IP: [] ext4_ext_remove_space+0x89c/0xc90 [29367.345471] PGD c1ef31067 PUD aa4435067 PMD 0 [29367.345485] Oops: [#1] SMP [29367.345495] CPU 4 [29367.345503] Pid: 16922, comm: rsync Tainted: GW3.5.1+ #3 Supermicro X8DTH-i/6/iF/6F/X8DTH [29367.345520] RIP: 0010:[] [] ext4_ext_remove_space+0x89c/0xc90 [29367.345534] RSP: 0018:880a7db79c98 EFLAGS: 00010246 [29367.345542] RAX: RBX: 0002 RCX: 0003c06c3600 [29367.345550] RDX: 0001 RSI: 0001f4b88bf3 RDI: 0002 [29367.345558] RBP: 880a7db79d88 R08: c06c3600 R09: 8806245245c0 [29367.345566] R10: R11: R12: 0001 [29367.345574] R13: 8806245245f0 R14: 88029948b0cc R15: 8800b53596f0 [29367.345582] FS: 7f8b5c30e700() GS:88063fc8() knlGS: [29367.345593] CS: 0010 DS: ES: CR0: 8005003b [29367.345598] CR2: 0028 CR3: 000b59a6c000 CR4: 07e0 [29367.345604] DR0: DR1: DR2: [29367.345609] DR3: DR6: 0ff0 DR7: 0400 [29367.345615] Process rsync (pid: 16922, threadinfo 880a7db78000, task 8800bb318cf0) [29367.345621] Stack: [29367.345624] 880a7db79cd8 8116821b 880a7db79ce8 8800b53596f0 [29367.345638] 8808840eb600 880a7db79d40 88062554a000 8800fff5 [29367.345651] 880a7db79d28 8800b5359660 88062554a000 880624524620 [29367.345664] Call Trace: [29367.345671] [] ? __ext4_handle_dirty_metadata+0x7b/0x100 [29367.345678] [] ext4_ext_truncate+0x173/0x1b0 [29367.345685] [] ? ext4_mark_inode_dirty+0x66/0x170 [29367.345693] [] ext4_truncate+0x5d/0x70 [29367.345699] [] ext4_evict_inode+0x378/0x3d0 [29367.345707] [] evict+0xaa/0x1a0 [29367.345713] [] iput+0x103/0x210 [29367.345720] [] do_unlinkat+0x154/0x1c0 [29367.345729] [] ? vfs_write+0x118/0x160 [29367.345739] [] ? sys_write+0x45/0xa0 [29367.345745] [] sys_unlink+0x11/0x20 [29367.345753] [] system_call_fastpath+0x1a/0x1f [29367.345759] Code: 8b 4d 20 0f b7 41 02 48 8d 04 40 48 8d 04 81 49 89 45 18 0f b7 49 02 48 83 c1 01 49 89 4d 00 e9 c5 f8 ff ff 0f 1f 00 49 8b 45 28 <48> 8b 40 28 49 89 45 20 e9 9c f8 ff ff 0f 1f 80 00 00 00 00 41 [29367.345874] RIP [] ext4_ext_remove_space+0x89c/0xc90 [29367.345881] RSP [29367.345885] CR2: 0028 [29367.345890] ---[ end trace bb4eebc57a10f73b ]--- [35775.632435] 3w-sas: scsi0: AEN: INFO (0x04:0x002B): Verify completed:unit=0. [39395.965177] 3w-sas: scsi0: AEN: INFO (0x04:0x0091): Unit now in standby mode:unit=0. [50143.132858] 3w-sas: scsi0: AEN: INFO (0x04:0x0090): Unit now in active mode:unit=0. Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.5.1 kernel: Oops + stracktrace + ext4 kernel errors!
On Fri, Aug 24, 2012 at 11:31 AM, Justin Piszcz wrote: > Hello, > > Thoughts? .. Going back to XFS. EXT4 appears unstable after more than 16TB is on the array (of 60TB ext4fs). Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: 3.5.1 kernel: Oops + stracktrace + ext4 kernel errors!
-Original Message- From: Theodore Ts'o [mailto:ty...@mit.edu] Sent: Friday, August 24, 2012 6:39 PM To: Justin Piszcz Cc: linux-kernel@vger.kernel.org; linux-e...@vger.kernel.org; al piszcz Subject: Re: 3.5.1 kernel: Oops + stracktrace + ext4 kernel errors! On Fri, Aug 24, 2012 at 11:31:44AM -0400, Justin Piszcz wrote: > Hello, > > Thoughts? > > Saw this when trying to copy files to array with Samba and doing file > operations: > > [28939.505792] [ cut here ] > [29367.345433] BUG: unable to handle kernel NULL pointer dereference > at 0028 > [29367.345455] IP: [] ext4_ext_remove_space+0x89c/0xc90 Fixed by commit 89a4e48f84 in upstream. It is scheduled for inclusion in the a stable kernel series; I believe it should be in 3.5.3. Regards, - Ted -- Thanks.. if/when I come across another box I can test with I will ensure that patch (89a4e48f84 ) gets applied. For PROD hosts I need stability > 16T. Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
3.5.2: moving files from xfs/disk -> nfs: radix_tree_lookup_slot+0xe/0x10
Hi, Moving ~276GB of files (mainly large backups) and everything has seemed to lockup on the client moving data to the server, it is still in this state.. [75716.705697] INFO: task sync:8790 blocked for more than 120 seconds. [75716.705701] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [75716.705703] syncD 88040ec54830 0 8790 2141 0x [75716.705708] 88001fff1d08 0086 81862fc0 88001fff1fd8 [75716.705713] 88001fff1fd8 4000 88041d958670 88040ec54830 [75716.705716] 88001fff1c38 812dcaee 88001fff1c58 88001fff1d78 [75716.705720] Call Trace: [75716.705729] [] ? radix_tree_lookup_slot+0xe/0x10 [75716.705733] [] ? find_get_pages_tag+0xc6/0x150 [75716.705738] [] ? __enqueue_entity+0x70/0x80 [75716.705742] [] ? __sync_filesystem+0x90/0x90 [75716.705747] [] schedule+0x24/0x70 [75716.705751] [] schedule_timeout+0x1a9/0x210 [75716.705755] [] ? calc_period_shift+0x60/0x60 [75716.705760] [] ? check_preempt_curr+0x75/0xa0 [75716.705764] [] wait_for_common+0xc0/0x150 [75716.705767] [] ? try_to_wake_up+0x280/0x280 [75716.705770] [] ? __sync_filesystem+0x90/0x90 [75716.705773] [] wait_for_completion+0x18/0x20 [75716.705777] [] writeback_inodes_sb_nr+0x77/0xa0 [75716.705782] [] ? shrink_dcache_for_umount_subtree+0x111/0x1d0 [75716.705785] [] writeback_inodes_sb+0x29/0x40 [75716.705788] [] __sync_filesystem+0x47/0x90 [75716.705791] [] sync_one_sb+0x1b/0x20 [75716.705795] [] iterate_supers+0xe1/0xf0 [75716.705798] [] sys_sync+0x2b/0x60 [75716.705802] [] system_call_fastpath+0x1a/0x1f [75836.701197] INFO: task sync:8790 blocked for more than 120 seconds. Thoughts? Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Upgraded from 3.4 to 3.5.1 kernel: machine does not boot
Hello, Motherboard: Supermicro X8DTH-6F Distro: Debian Testing x86_64 >From 3.4 -> 3.5.1 on x86_64 make oldconfig and a few minor changes and the machine attempts to boot but hangs at the filesystem mounting part of the boot process. Picture of where it stops working (a little burry but readable) http://home.comcast.net/~jpiszcz/20120810/3.5-kernel-hangs.jpg Kernel config 3.4 (working) http://home.comcast.net/~jpiszcz/20120810/config-3.4.txt Kernel config 3.5.1 (hangs) http://home.comcast.net/~jpiszcz/20120810/config-3.5.1.txt As you see towards the end the machine has been sitting there for 1 hour as that's the timeout I have the drives spindown on the 3ware card. Any thoughts as what is wrong here? Diff between the two: $ diff -u config-3.4.txt config-3.5.1.txt |grep '^+C' +CONFIG_ARCH_SUPPORTS_UPROBES=y +CONFIG_BUILDTIME_EXTABLE_SORT=y +CONFIG_CLOCKSOURCE_WATCHDOG=y +CONFIG_ARCH_CLOCKSOURCE_DATA=y +CONFIG_GENERIC_TIME_VSYSCALL=y +CONFIG_GENERIC_CLOCKEVENTS=y +CONFIG_GENERIC_CLOCKEVENTS_BUILD=y +CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y +CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y +CONFIG_GENERIC_CMOS_UPDATE=y +CONFIG_TICK_ONESHOT=y +CONFIG_NO_HZ=y +CONFIG_HIGH_RES_TIMERS=y +CONFIG_RCU_FANOUT_LEAF=16 +CONFIG_GENERIC_SMP_IDLE_THREAD=y +CONFIG_HAVE_ARCH_SECCOMP_FILTER=y +CONFIG_SECCOMP_FILTER=y +CONFIG_CROSS_MEMORY_ATTACH=y +CONFIG_X86_DEV_DMA_OPS=y +CONFIG_NETFILTER_NETLINK=y +CONFIG_NF_CT_NETLINK=y +CONFIG_HAVE_BPF_JIT=y +CONFIG_E1000E=y +CONFIG_IXGBE_HWMON=y +CONFIG_NET_VENDOR_I825XX=y +CONFIG_HID=y +CONFIG_HIDRAW=y +CONFIG_HID_GENERIC=y +CONFIG_USB_HID=y +CONFIG_HID_PID=y +CONFIG_USB_HIDDEV=y +CONFIG_NEW_LEDS=y +CONFIG_LEDS_CLASS=y +CONFIG_NFS_V2=y +CONFIG_PANIC_ON_OOPS_VALUE=0 +CONFIG_RCU_CPU_STALL_INFO=y +CONFIG_CRYPTO_CRC32C=y +CONFIG_GENERIC_STRNCPY_FROM_USER=y +CONFIG_GENERIC_STRNLEN_USER=y Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Upgraded from 3.4 to 3.5.1 kernel: machine does not boot
On Fri, Aug 10, 2012 at 1:53 PM, Jesper Juhl wrote: > On Fri, 10 Aug 2012, Justin Piszcz wrote: > >> Hello, >> >> Motherboard: Supermicro X8DTH-6F >> Distro: Debian Testing x86_64 >> >> >From 3.4 -> 3.5.1 on x86_64 make oldconfig and a few minor changes and the >> machine attempts to boot but hangs at the filesystem mounting part of the >> boot process. Hi, Found the root cause, the 3.5.1 kernel cannot mount my ext4 filesystem (60TB). The 3.4 kernel works fine. This is proven by commenting out the filesystem in /etc/fstab with 3.5.1, and all is OK. When I run mount for that filesystem, it hangs, I ran alt+sysrq+t to get additional output and I have pasted it below with the 3.5.1 kernel: [ 160.373406] mount R running task0 4361 4355 0x [ 160.373407] 8806266bdb68 0086 8806266bdaa8 8806266bdfd8 [ 160.373410] 8806266bdfd8 4000 8806270b0600 880626c73a10 [ 160.373413] 00011240 880c260177c0 880c260177c0 [ 160.373415] Call Trace: [ 160.373416] [] ? __schedule+0x299/0x770 [ 160.373418] [] __cond_resched+0x25/0x40 [ 160.373420] [] _cond_resched+0x2a/0x40 [ 160.373421] [] ext4_calculate_overhead+0x239/0x3e0 [ 160.373425] [] ext4_fill_super+0x1aa9/0x2930 [ 160.373427] [] mount_bdev+0x19f/0x1e0 [ 160.373429] [] ? ext4_calculate_overhead+0x3e0/0x3e0 [ 160.373431] [] ext4_mount+0x10/0x20 [ 160.373433] [] mount_fs+0x1b/0xd0 [ 160.373434] [] vfs_kern_mount+0x6f/0x110 [ 160.373437] [] do_kern_mount+0x4f/0x100 [ 160.373439] [] do_mount+0x2fe/0x8a0 [ 160.373440] [] ? strndup_user+0x53/0x70 [ 160.373442] [] sys_mount+0x90/0xe0 [ 160.373443] [] system_call_fastpath+0x1a/0x1f [ 160.373446] jbd2/sda1-8 S 880c2675f800 0 4362 2 0x [ 160.373448] 880623ca9e50 0046 880626c73a10 880623ca9fd8 [ 160.373450] 880623ca9fd8 4000 8806271b9850 880626d08250 [ 160.373453] 880623ca9da0 8806266bdbe0 880c2675f8a0 880c2675f888 [ 160.373455] Call Trace: [ 160.373456] [] ? default_wake_function+0xd/0x10 [ 160.373458] [] ? autoremove_wake_function+0x11/0x40 [ 160.373460] [] ? __wake_up_common+0x55/0x90 [ 160.373462] [] schedule+0x24/0x70 [ 160.373463] [] kjournald2+0x1ce/0x1e0 [ 160.373465] [] ? abort_exclusive_wait+0xb0/0xb0 [ 160.373467] [] ? commit_timeout+0x10/0x10 [ 160.373469] [] kthread+0x8e/0xa0 [ 160.373471] [] kernel_thread_helper+0x4/0x10 [ 160.373472] [] ? kthread_flush_work_fn+0x10/0x10 [ 160.373474] [] ? gs_change+0xb/0xb Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Upgraded from 3.4 to 3.5.1 kernel: machine does not boot
-Original Message- From: Justin Piszcz [mailto:jpis...@lucidpixels.com] Sent: Friday, August 10, 2012 5:46 PM To: Jesper Juhl Cc: linux-kernel@vger.kernel.org; a...@solarrain.com Subject: Re: Upgraded from 3.4 to 3.5.1 kernel: machine does not boot On Fri, Aug 10, 2012 at 1:53 PM, Jesper Juhl wrote: > On Fri, 10 Aug 2012, Justin Piszcz wrote: > >> Hello, >> >> Motherboard: Supermicro X8DTH-6F >> Distro: Debian Testing x86_64 >> >> >From 3.4 -> 3.5.1 on x86_64 make oldconfig and a few minor changes and the >> machine attempts to boot but hangs at the filesystem mounting part of the >> boot process. Hi, Found the root cause, the 3.5.1 kernel cannot mount my ext4 filesystem (60TB). The 3.4 kernel works fine. This is proven by commenting out the filesystem in /etc/fstab with 3.5.1, and all is OK. -- Hi again, I tested with linux-3.6-rc1: The same problem, here is what I get from the strace: irectory) 4434 readlink("/dev", 0x7fff3b05c670, 4096) = -1 EINVAL (Invalid argument) 4434 readlink("/dev/sda1", 0x7fff3b05c670, 4096) = -1 EINVAL (Invalid argument) 4434 readlink("/r1", 0x7fff3b05c670, 4096) = -1 EINVAL (Invalid argument) 4434 getuid() = 0 4434 geteuid() = 0 4434 getgid() = 0 4434 getegid() = 0 4434 prctl(PR_GET_DUMPABLE)= 1 4434 lstat("/etc/mtab", {st_mode=S_IFLNK|0777, st_size=12, ...}) = 0 4434 getuid() = 0 4434 geteuid() = 0 4434 getgid() = 0 4434 getegid() = 0 4434 prctl(PR_GET_DUMPABLE)= 1 4434 stat("/run", {st_mode=S_IFDIR|0755, st_size=820, ...}) = 0 4434 lstat("/run/mount/utab", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 4434 open("/run/mount/utab", O_RDWR|O_CREAT, 0644) = 3 4434 close(3) = 0 4434 mount("/dev/sda1", "/r1", "ext4", MS_MGC_VAL|MS_NOATIME, NULL -- (w/ 3.6-rc1) [ 89.868843] mount R running task0 4434 4433 0x0009 [ 89.868847] 880c246b7b68 816c9279 880c246b7aa8 880c246b7fd8 [ 89.868851] 880c246b7fd8 4000 88062720cdb0 880c246862d0 [ 89.868855] 000116c0 880623a863c0 880623a863c0 [ 89.868855] Call Trace: [ 89.868858] [] ? __schedule+0x299/0x770 [ 89.868860] [] ? __schedule+0x299/0x770 [ 89.868864] [] ? ext4_get_group_desc+0x49/0xb0 [ 89.868868] [] ? ext4_calculate_overhead+0x131/0x3e0 [ 89.868871] [] ? ext4_fill_super+0x1a4b/0x28d0 [ 89.868875] [] ? mount_bdev+0x1a1/0x1e0 [ 89.868877] [] ? ext4_calculate_overhead+0x3e0/0x3e0 [ 89.868880] [] ? ext4_mount+0x10/0x20 [ 89.868882] [] ? mount_fs+0x1b/0xd0 [ 89.868885] [] ? vfs_kern_mount+0x6f/0x110 [ 89.86] [] ? do_kern_mount+0x4f/0x100 [ 89.868890] [] ? do_mount+0x2fe/0x8a0 [ 89.868894] [] ? strndup_user+0x53/0x70 [ 89.868896] [] ? sys_mount+0x90/0xe0 [ 89.868899] [] ? tracesys+0xd4/0xd9 Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Upgraded from 3.4 to 3.5.1 kernel: machine does not boot
On Fri, Aug 10, 2012 at 7:07 PM, Justin Piszcz > > Hi, > > Found the root cause, the 3.5.1 kernel cannot mount my ext4 filesystem > (60TB). > > The 3.4 kernel works fine. > > This is proven by commenting out the filesystem in /etc/fstab with > 3.5.1, and all is OK. > > -- > > Hi again, > > I tested with linux-3.6-rc1: > > The same problem, here is what I get from the strace: > > irectory) > 4434 readlink("/dev", 0x7fff3b05c670, 4096) = -1 EINVAL (Invalid argument) > 4434 readlink("/dev/sda1", 0x7fff3b05c670, 4096) = -1 EINVAL (Invalid > argument) > 4434 readlink("/r1", 0x7fff3b05c670, 4096) = -1 EINVAL (Invalid argument) > 4434 getuid() = 0 > 4434 geteuid() = 0 > 4434 getgid() = 0 > 4434 getegid() = 0 > 4434 prctl(PR_GET_DUMPABLE)= 1 > 4434 lstat("/etc/mtab", {st_mode=S_IFLNK|0777, st_size=12, ...}) = 0 > 4434 getuid() = 0 > 4434 geteuid() = 0 > 4434 getgid() = 0 > 4434 getegid() = 0 > 4434 prctl(PR_GET_DUMPABLE)= 1 > 4434 stat("/run", {st_mode=S_IFDIR|0755, st_size=820, ...}) = 0 > 4434 lstat("/run/mount/utab", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 > 4434 open("/run/mount/utab", O_RDWR|O_CREAT, 0644) = 3 > 4434 close(3) = 0 > 4434 mount("/dev/sda1", "/r1", "ext4", MS_MGC_VAL|MS_NOATIME, NULL > > -- > > (w/ 3.6-rc1) > > [ 89.868843] mount R running task0 4434 4433 > 0x0009 > [ 89.868847] 880c246b7b68 816c9279 880c246b7aa8 > 880c246b7fd8 > [ 89.868851] 880c246b7fd8 4000 88062720cdb0 > 880c246862d0 > [ 89.868855] 000116c0 880623a863c0 880623a863c0 > > [ 89.868855] Call Trace: > [ 89.868858] [] ? __schedule+0x299/0x770 > [ 89.868860] [] ? __schedule+0x299/0x770 > [ 89.868864] [] ? ext4_get_group_desc+0x49/0xb0 > [ 89.868868] [] ? ext4_calculate_overhead+0x131/0x3e0 > [ 89.868871] [] ? ext4_fill_super+0x1a4b/0x28d0 > [ 89.868875] [] ? mount_bdev+0x1a1/0x1e0 > [ 89.868877] [] ? ext4_calculate_overhead+0x3e0/0x3e0 > [ 89.868880] [] ? ext4_mount+0x10/0x20 > [ 89.868882] [] ? mount_fs+0x1b/0xd0 > [ 89.868885] [] ? vfs_kern_mount+0x6f/0x110 > [ 89.86] [] ? do_kern_mount+0x4f/0x100 > [ 89.868890] [] ? do_mount+0x2fe/0x8a0 > [ 89.868894] [] ? strndup_user+0x53/0x70 > [ 89.868896] [] ? sys_mount+0x90/0xe0 > [ 89.868899] [] ? tracesys+0xd4/0xd9 > > Justin. > > > CC: linux-ext4 Any ideas here (kernel 3.4 and below can mount 60TB ext4 no issues) but > 3.5.1 (did not try 3.5) cannot mount the filesystem. Justin. Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
3.4->(3.5.1 || 3.6-rc1) => can no longer mount 60TB ext4 filesystem
Hello, I upgrade to each new kernel release and with 3.5.1 (from 3.4) I can no longer mount my 60TB ext4 volume. If I boot back to 3.4, it works fine. Details here: https://lkml.org/lkml/2012/8/10/205 Anything I can do besides testing each 3.5-rcX to find where the regression lies? Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Upgraded from 3.4 to 3.5.1 kernel: machine does not boot
On Sun, Aug 12, 2012 at 9:10 AM, Eric Sandeen wrote: > On 8/10/12 11:14 PM, Justin Piszcz wrote: >> On Fri, Aug 10, 2012 at 7:07 PM, Justin Piszcz >>> >>> Hi, >>> >>> Found the root cause, the 3.5.1 kernel cannot mount my ext4 filesystem >>> (60TB). > > You are a brave man running ext4 at 60T, but thank you for testing :) > > Backing out 8aeb00ff85ad25453765dd339b408c0087db1527 from 3.5.1 > (952fc18ef9ec707ebdc16c0786ec360295e5ff15 upstream) probably helps? > > From a quick look, I think that essentially has a : > > for (i = 0; i < ngroups; i++) { > > for (j = 0; j < ngroups; j++) { > > } > } > > type nested loop going on; for a filesystem this big it's going to take almost > literally forever, if I read it right. > > -Eric Hello, It worked!! I can mount my filesystem now! I pulled down 3.5 and backed out that commit, I could not quickly find a doc to do this, so I will add how to do that below: 1. Clone Linux repo (3.5/stable as of this writing) git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git 2. List commits: git log 3. Show a specific commit git show 8aeb00ff85ad25453765dd339b408c0087db1527 4. How to revert the commit: git revert 8aeb00ff85ad25453765dd339b408c0087db1527 # On branch master nothing to commit (working directory clean) 5. Recompile, reboot, does it work? # df -h Filesystem Size Used Avail Use% Mounted on /dev/sda161T 17T 44T 28% /r1 # uname -a Linux p34 3.5.0 #1 SMP Sun Aug 12 09:42:41 EDT 2012 x86_64 GNU/Linux Yes! CC: Greg to see if this can be backed out for 3.5.2? Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.4->(3.5.1 || 3.6-rc1) => can no longer mount 60TB ext4 filesystem
On Sun, Aug 12, 2012 at 9:31 AM, Daniel Mack wrote: > On 11.08.2012 19:36, Justin Piszcz wrote: >> Hello, >> >> I upgrade to each new kernel release and with 3.5.1 (from 3.4) I can no >> longer mount my 60TB ext4 volume. >> If I boot back to 3.4, it works fine. >> >> Details here: >> https://lkml.org/lkml/2012/8/10/205 >> >> Anything I can do besides testing each 3.5-rcX to find where the regression >> lies? > > I'm not at all familiar with ext4 internals, but as always, an wasy way > is to bisect such a problem. Any change you can do that? > > http://www.kernel.org/pub/software/scm/git/docs/git-bisect.html > > > Daniel > Hi, Thanks, will save this for the future-- would of done this but Eric Sandeen found the offending patch, I reverted it and I can mount the filesystem now. The bad commit was 8aeb00ff85ad25453765dd339b408c0087db1527 (per sandeen) Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Upgraded from 3.4 to 3.5.1 kernel: machine does not boot
On Sun, Aug 12, 2012 at 10:13 AM, Paul Gortmaker wrote: > On Sun, Aug 12, 2012 at 9:51 AM, Justin Piszcz > wrote: >> On Sun, Aug 12, 2012 at 9:10 AM, Eric Sandeen wrote: >>> On 8/10/12 11:14 PM, Justin Piszcz wrote: >>>> On Fri, Aug 10, 2012 at 7:07 PM, Justin Piszcz >>>>> >>>>> Hi, >>>>> >>>>> Found the root cause, the 3.5.1 kernel cannot mount my ext4 filesystem >>>>> (60TB). >>> >>> You are a brave man running ext4 at 60T, but thank you for testing :) >>> >>> Backing out 8aeb00ff85ad25453765dd339b408c0087db1527 from 3.5.1 >>> (952fc18ef9ec707ebdc16c0786ec360295e5ff15 upstream) probably helps? >>> >>> From a quick look, I think that essentially has a : >>> >>> for (i = 0; i < ngroups; i++) { >>> >>> for (j = 0; j < ngroups; j++) { >>> >>> } >>> } >>> >>> type nested loop going on; for a filesystem this big it's going to take >>> almost >>> literally forever, if I read it right. >>> >>> -Eric >> >> Hello, >> >> It worked!! I can mount my filesystem now! >> >> I pulled down 3.5 and backed out that commit, I could not quickly find >> a doc to do this, so I will add how to do that below: >> >> 1. Clone Linux repo (3.5/stable as of this writing) >> git clone >> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git >> >> 2. List commits: >> git log >> >> 3. Show a specific commit >> git show 8aeb00ff85ad25453765dd339b408c0087db1527 >> >> 4. How to revert the commit: >> git revert 8aeb00ff85ad25453765dd339b408c0087db1527 >> >> # On branch master >> nothing to commit (working directory clean) > > You didn't actually revert anything here, because your clone left > you on "master" branch, which points at 3.5 (i.e. 3.5.0). It does > not contain the commit which is of interest to you. > > - > linux-stable$git tag --contains 8aeb00ff > v3.5.1 > linux-stable$git branch --contains 8aeb00ff > linux-3.5.y > linux-stable$ > > > The master branch in linux-stable is left pointing at one of the > most recent mainline (i.e. non-stable) tags, and all of the stable > content is on individual branches (type "git branch" to see them). > > So if you do a "git checkout linux-3.5.y" and then do the revert, > you will actually be testing what you wanted to test. > > Paul. > -- Yikes, I saw the git details (via get show) but that must check the commit via git/inet-- I assumed that was also in the 3.5 tree, but its not per your check, so I've made some changes to my notes, recompiled, rebooted .. and success!! Woohoo! One other item, I've never used a git updated kernel before, usually just patch -p1 the mainline or pull it down directly (but git seems nicer now that I know how to do it), does the '+' signify its the 3.5.1 kernel and then a '+' because I made changes to it? p34:~# df -h | grep /r1 /dev/sda161T 16T 45T 26% /r1 p34:~# uname -a Linux p34 3.5.1+ #3 SMP Sun Aug 12 10:31:34 EDT 2012 x86_64 GNU/Linux p34:~# uptime 10:35:12 up 1 min, 1 user, load average: 0.05, 0.03, 0.01 p34:~# --- Updated notes: 1. Clone Linux repo (3.5/stable as of this writing) git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-3.5 2.0 Cd into linux-3.5 cd linux-3.5 2.1 Check available kernel versions: git tag | tail -n 3 v3.5-rc6 v3.5-rc7 v3.5.1 2.2 Update to the latest 3.5.1 kernel: git checkout linux-3.5.1 Note: checking out 'v3.5.1'. .. HEAD is now at cbd3c20... Linux 3.5.1 2.3 Confirm it is 3.5.1: # head -n 3 Makefile VERSION = 3 PATCHLEVEL = 5 SUBLEVEL = 1 2.4 List commits: git log 3. Show a specific commit git show 8aeb00ff85ad25453765dd339b408c0087db1527 4. How to revert the commit: git revert 8aeb00ff85ad25453765dd339b408c0087db1527 (It brings up a text editor like svn/cvs, commit -> write/save/quit) [detached HEAD 35d699f] Revert "ext4: fix overhead calculation used by ext4_statfs()" Committer: root Your name and email address were configured automatically based on your username and hostname. Please check that they are accurate. You can suppress this message by setting them explicitly: git config --global user.name "Your Name" git config --global user.email y...@example.com After doing this, you may fix the identity used for this commit with: git commit --amend --reset-author 4 files changed, 57 insertions(+), 132 deletions(-) 5. Recompile, reboot, does it work, still? # df -h Filesystem Size Used Avail Use% Mounted on /dev/sda161T 17T 44T 28% /r1 # uname -a Linux p34 3.5.0 #1 SMP Sun Aug 12 09:42:41 EDT 2012 x86_64 GNU/Linux Yes! -- How to find where a particular commit lies? - linux-stable$git tag --contains 8aeb00ff v3.5.1 linux-stable$git branch --contains 8aeb00ff linux-3.5.y linux-stable$ -- Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
3.5.1: WARNING: at include/linux/iocontext.h:140 copy_process.part.56+0x1041/0x1190()
Hi, Kernel: 3.5.1 x86_64 Just FYI, this may have been due to an NFS issue (remote host turned off during dump possibly) but reporting just incase: [41793.725267] [ cut here ] [41793.725273] WARNING: at include/linux/iocontext.h:140 copy_process.part.56+0x1041/0x1190() [41793.725274] Hardware name: X9SCL/X9SCM [41793.725276] Pid: 20378, comm: dump Not tainted 3.5.1 #2 [41793.725276] Call Trace: [41793.725279] [] warn_slowpath_common+0x75/0xb0 [41793.725280] [] warn_slowpath_null+0x15/0x20 [41793.725282] [] copy_process.part.56+0x1041/0x1190 [41793.725283] [] do_fork+0x13b/0x2f0 [41793.725285] [] ? recalc_sigpending+0x12/0x30 [41793.725287] [] ? __set_current_blocked+0x3a/0x60 [41793.725290] [] sys_clone+0x23/0x30 [41793.725293] [] stub_clone+0x13/0x20 [41793.725294] [] ? system_call_fastpath+0x1a/0x1f [41793.725295] ---[ end trace 6afd1df8f82f60e4 ]--- Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question
Hi, Is the following normal on an X9SRL-F board (bios 1.0a)? In the manual it states: Data Direct I/O Select Enabled to enable Intel I/OAT (I/O Acceleration Technology), which significantly reduces CPU overhead by leveraging CPU architectural improvements and freeing the system resource for other tasks. The options are Disabled and Enabled. Default is Enabled. When enabled in the kernel, I see the following: [0.696357] ioatdma: Intel(R) QuickData Technology Driver 4.00 [0.696487] ioatdma :00:04.0: channel error register unreachable [0.696546] ioatdma :00:04.0: channel enumeration error [0.696604] ioatdma :00:04.0: Intel(R) I/OAT DMA Engine init failed [0.696721] ioatdma :00:04.1: channel error register unreachable [0.696779] ioatdma :00:04.1: channel enumeration error [0.697522] ioatdma :00:04.1: Intel(R) I/OAT DMA Engine init failed [0.697617] ioatdma :00:04.2: channel error register unreachable [0.697681] ioatdma :00:04.2: channel enumeration error [0.697739] ioatdma :00:04.2: Intel(R) I/OAT DMA Engine init failed [0.697831] ioatdma :00:04.3: channel error register unreachable [0.697890] ioatdma :00:04.3: channel enumeration error [0.697948] ioatdma :00:04.3: Intel(R) I/OAT DMA Engine init failed [0.698037] ioatdma :00:04.4: channel error register unreachable [0.698095] ioatdma :00:04.4: channel enumeration error [0.698153] ioatdma :00:04.4: Intel(R) I/OAT DMA Engine init failed [0.698245] ioatdma :00:04.5: channel error register unreachable [0.698303] ioatdma :00:04.5: channel enumeration error [0.698360] ioatdma :00:04.5: Intel(R) I/OAT DMA Engine init failed [0.698449] ioatdma :00:04.6: channel error register unreachable [0.698508] ioatdma :00:04.6: channel enumeration error [0.698565] ioatdma :00:04.6: Intel(R) I/OAT DMA Engine init failed [0.698676] ioatdma :00:04.7: channel error register unreachable [0.698735] ioatdma :00:04.7: channel enumeration error [0.698792] ioatdma :00:04.7: Intel(R) I/OAT DMA Engine init failed -- Also, I tried using ASPM (enabled in BIOS), but since ACPI Linux query is ignored, it fails to work: [0.562229] [Firmware Bug]: ACPI: BIOS _OSI(Linux) query ignored I assume this is something Supermicro has to fix? Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
3.6.10: Intel: ixgbe 0000:01:00.0 eth4: Detected Tx Unit Hang
Hello, Kernel 3.6.10, first time I have seen this that I can remember (on 10GbE) anyway, is this a known issue with 3.6.10? When the link went down is when I rebooted/etc the remote host attached on the other end. I've not changed anything physically with the hardware and have been on 3.6.0-3.6.9 and noticed this when I moved to 3.6.10. [10270.229200] ixgbe :01:00.0 eth4: NIC Link is Down [10276.124937] ixgbe :01:00.0 eth4: NIC Link is Up 10 Gbps, Flow Control: RX/TX [24529.430997] ixgbe :01:00.0 eth4: Detected Tx Unit Hang [24529.430997] Tx Queue <10> [24529.430997] TDH, TDT <4e>, <51> [24529.430997] next_to_use <51> [24529.430997] next_to_clean<4e> [24529.430997] tx_buffer_info[next_to_clean] [24529.430997] time_stamp <10172668f> [24529.430997] jiffies <101726ea4> [24529.431011] ixgbe :01:00.0 eth4: tx hang 1 detected on queue 10, resetting adapter [24529.431028] ixgbe :01:00.0 eth4: Reset adapter Thoughts? lspci -vvxx 01:00.0 Ethernet controller: Intel Corporation 82598EB 10-Gigabit AT2 Server Adapter (rev 01) Subsystem: Intel Corporation 82598EB 10-Gigabit AT2 Server Adapter Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: 3.7: Intel: ixgbe 0000:01:00.0 eth4: Detected Tx Unit Hang
-Original Message- From: Justin Piszcz [mailto:jpis...@lucidpixels.com] Sent: Saturday, December 15, 2012 10:49 AM To: linux-kernel@vger.kernel.org Subject: 3.6.10: Intel: ixgbe :01:00.0 eth4: Detected Tx Unit Hang Hello, CORRECTION: Kernel 3.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: 3.6.10: Intel: ixgbe 0000:01:00.0 eth4: Detected Tx Unit Hang
-Original Message- From: devendra.aaru [mailto:devendra.a...@gmail.com] Sent: Monday, December 17, 2012 1:39 PM To: Justin Piszcz Cc: linux-kernel@vger.kernel.org; net...@vger.kernel.org Subject: Re: 3.6.10: Intel: ixgbe :01:00.0 eth4: Detected Tx Unit Hang Ccing netdev On Sat, Dec 15, 2012 at 10:49 AM, Justin Piszcz wrote: > Hello, > > Kernel 3.6.10, first time I have seen this that I can remember (on 10GbE) > anyway, is this a known issue with 3.6.10? > > When the link went down is when I rebooted/etc the remote host attached on > the other end. > I've not changed anything physically with the hardware and have been on > 3.6.0-3.6.9 and noticed this when I moved to 3.6.10. -- > I don't believe we have seen Tx hangs in validation. If you could narrow down the conditions that lead to the Tx hang that would help a lot. Also > the output of ethtool -S eth4 after the Tx hang occurs can be useful to get an idea of the load on the interface. > Thanks, > Emil -- In this case I only have two servers that mount each other's NFS volumes and that were idle at the time, I rebooted one of the systems and that is when I saw this, if I can get something to repeat/pattern and/or the ethtool output I will update this thread, thank you. Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
X9SCM-F-O clock drift +1 second into the future when ntp running?
Hello, I migrated from an X7SPA to an X9SCM-F-O and now gpsd/ntp no longer sync with my GPS unit: http://www.amazon.com/GlobalSat-BU-353-USB-GPS-Receiver/dp/B000PKX2KA I did some digging and it looks like the system clock on this motherboard with the latest BIOS (2.00a) runs 1 second too fast when comparing to other NTP-synchronized machines. When comparing the clock on this vs. an atomic clock, the system clock is ~1 second faster, which is probably why the GPS has problems syncing. Is this a faulty motherboard clock or is this an issue with Ivy Bridge (I am using an E3-1200 V2 CPU) with the X9SCM-F-O and BIOS 2.00a? DMI Info: Handle 0x, DMI type 0, 24 bytes BIOS Information Vendor: American Megatrends Inc. Version: 2.0a Release Date: 06/08/2012 Address: 0xF Runtime Size: 64 kB ROM Size: 8192 kB NTP problem: Problem, the x127 for the GPS: $ ntpq -pn remote refid st t when poll reach delay offset jitter == x127.127.28.0.GPS.0 l 11 16 3770.0000.363 100.414 *204.235.61.9128.174.38.133 2 u 48 64 37 48.716 -985.27 326.767 +184.105.192.247 216.218.254.202 2 u 43 64 37 90.902 -987.55 332.766 +50.7.247.11485.114.26.1942 u 42 64 37 158.445 -985.19 330.627 +69.65.40.29 209.51.161.238 2 u 43 64 37 47.733 -984.50 329.232 Any idea why it consistently has a ~1 second offset? Is there a good way to fix this? Before, on the X7SPA I ran gpsd+ntp for years without any issues, it synchronized perfectly. Is this a BIOS issue? Kernel problem or HW issue? $ cat /sys/devices/system/clocksource/clocksource0/current_clocksource tsc Full output: # adjtimex -p mode: 0 offset: 0 frequency: 1523449 maxerror: 1600 esterror: 1600 status: 8257 time_constant: 3 precision: 1 tolerance: 32768000 tick: 10015 raw time: 1341673069s 873569969us = 1341673069.873569969 return value = 5 After 5-10 minutes without ntpd running, the clock had drifted -2.27 seconds the other direction.. # ntpdate time.nist.gov 7 Jul 11:03:32 ntpdate[374]: step time server 128.138.140.44 offset -2.273077 sec Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: X9SCM-F-O clock drift +1 second into the future when ntp running?
-Original Message- From: Justin Piszcz [mailto:jpis...@lucidpixels.com] Sent: Saturday, July 07, 2012 11:12 AM To: p...@lists.ntp.org; linux-kernel@vger.kernel.org Subject: X9SCM-F-O clock drift +1 second into the future when ntp running? Hello, I migrated from an X7SPA to an X9SCM-F-O and now gpsd/ntp no longer sync with my GPS unit: http://www.amazon.com/GlobalSat-BU-353-USB-GPS-Receiver/dp/B000PKX2KA I did some digging and it looks like the system clock on this motherboard with the latest BIOS (2.00a) runs 1 second too fast when comparing to other NTP-synchronized machines. -- I've done some more reading and what I've found is anything over 500ppm ntp cannot correct for; so is this would be a bad crystal/chip or is there something else wrong here? Supermicro X9SCM: (see the error_ppm) is all over the map: # adjtimex -a --- current --- -- suggested -- cmos time system-cmos error_ppm tick freqtick freq 1341688282 -0.731743 1341688292 -0.731619 12.4 9993 3497075 1341688302 -0.747186-1556.7 9993 3497075 10009659788 1341688312 -0.7471176.9 9993 34970759993 3045513 1341688322 -0.7470991.8 9993 34970759993 3378325 1341688332 -0.762636-1553.7 9993 3378325 10009344163 1341688342 -0.762650 -1.4 9993 33783259993 3470512 1341688352 -0.778221-1557.1 9993 3378325 10009566038 # adjtimex -a --- current --- -- suggested -- cmos time system-cmos error_ppm tick freqtick freq 1341688564 -0.591964 1341688574 -0.575373 1659.1 10009566038 1341688584 -0.574515 85.8 10009566038 10008 1497763 1341688594 -0.558007 1650.8 100095660389992 3789738 1341688604 -0.537084 2092.3 100095660389988 1071325 1341688614 -0.568492-3140.8 9988 1071325 10019 3744100 1341688624 -0.568774 -28.2 9988 10713259988 2919762 1341688634 -0.584618-1584.4 9988 1071325 10004 49663 Old MSI motherboard (Pentium 4-- see how the PPM is roughly in the same range) # adjtimex -a WARNING: CMOS time is 30.02 min ahead of system clock --- current --- -- suggested -- cmos time system-cmos error_ppm tick freqtick freq 1341690163 -1800.416735 1341690173 -1800.417111 -37.6 9996 2365500 1341690183 -1800.417499 -38.7 9996 23655009996 4904456 1341690193 -1800.417894 -39.5 9996 23655009996 4956019 1341690203 -1800.418297 -40.3 9996 23655009996 5007582 1341690213 -1800.418308 -1.0 9996 50075829996 5074664 1341690223 -1800.418327 -1.9 9996 50075829996 5134039 1341690233 -1800.418353 -2.6 9996 50075829996 5179351 Found this: http://compgroups.net/comp.protocols.time.ntp/hopelessly-broken-clock/490495 set clocksource=hpet for boot options instead of TSC Did not make any difference, still all over the map: # adjtimex -a --- current --- -- suggested -- cmos time system-cmos error_ppm tick freqtick freq 1341690087 0.238485 1341690097 0.240449 196.4 10015 1475154 1341690107 0.257408 1695.9 10015 14751549998 1744166 1341690117 0.274176 1676.8 10015 14751549998 2995729 1341690127 0.290927 1675.1 10015 14751549998 310 1341690137 0.291142 21.5 9998 3109998 1697291 1341690147 0.275783-1535.9 9998 310 10013 5458916 1341690157 0.276029 24.6 9998 3109998 1494166 Has anyone seen anything like this before? I checked all of the BIOS options, did not see anything out of the ordinary here.. Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Kernel 2.6.xx - NFSv3 vs. Samba Data Transfer Semantics
I have three machines with the same motherboard and gigabit ethernet, ABIT IC7-G. Two are Linux (Debian) One is Windows 2000. When I copy 100 gigabytes from a Windows 2000 PC to either one of my Linux machines, I get a *SUSTAINED* transfer rate of 40-50MB/s over gigabit. Sustained meaning, when I watch gkrellm, eth0 never dips below 40MB/s. When I copy 100 gigabytes from one Linux box to the other over NFS, I see all sorts of weirdness, 64MB/s for a few seconds, then 40MB/s, then 10-30MB/s, then 0MB/s for 2-3 seconds then 7MB/s, it goes all over the place. I have tried different (r|w)sizes without any conclusive results, they do not seem to make much of a difference. A few examples, copy an ~800MB file to a Linux box: TCP/FTP: 226 65.484 seconds (measured here), 11.98 Mbytes per second 822514728 bytes received in 65.48 secs (12266.2 kB/s) UDP/NFSv3: 0.15user 12.00system 0:26.22elapsed 46%CPU (0avgtext+0avgdata 0maxresident)k0inputs+0outputs (0major+148minor)pagefaults 0swaps 0.14user 13.96system 0:28.31elapsed 49%CPU (0avgtext+0avgdata 0maxresident)k0inputs+0outputs (0major+148minor)pagefaults 0swaps UDP/Samba, Win2K->Linux box: $ date +%s 1123338368 $ date +%s 1123338399 1123338399 - 1123338368 31 seconds I suppose NFS makes up for it bursting at such high speeds, but in some cases, a constant data rate is preferred. Are there any methods to duplicate the way Samba works to NFS? When NFS transfers are taking place, watching gkrellm, I see 64MB/s for a few seconds then it goes to 0 as the disk (hda) continues to write for 3-4 seconds, this continues on and off. With Samba and the W2K box pushing the data, it is more of a consistent stream with very few delays that are found with NFS. I am using the Intel e1000 driver for gigabit: :02:01.0 Ethernet controller: Intel Corp. 82547GI Gigabit Ethernet Controller. I *do* have the NAPI option enabled for the driver on both Linux machines. [*] Use Rx Polling (NAPI) Samba Config: # Increase overall throughput of samba. socket options = IPTOS_LOWDELAY TCP_NODELAY SO_SNDBUF=32768 SO_RCVBUF=8192 # Set max xmit size. max xmit = 8192 NFS Config/fstab Entry: machine:/mount /local/mount nfs rw,hard,intr,nfsvers=3 0 0 I am using XFS filesystems on both Linux machines. The drives are 7200RPM Seagate HDDs with either 2MB or 8MB of cache. Are there any 'tweaks' or 'hacks' to make NFS behave more like Samba or just to tune it in general that are not commonly known or found on google? Thanks, Justin. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Question regarding HPET the 2.6 series kernel.
[*] HPET Timer Support [*] Provide RTC interrupt [*] HPET - High Precision Event Timer [*] Allow mmap of HPET http://tlug.up.ac.za/guides/lkcg/arch_i386.html HPET Timer Support HPET_TIMER This enables the use of the HPET for the kernel's internal timer. HPET is the next generation timer replacing legacy 8254s. You can safely choose Y here. However, HPET will only be activated if the platform and the BIOS support this feature. Otherwise the 8254 will be used for timing services. Choose N to continue using the legacy 8254 timer. How do I determine if my BIOS has this feature? $ dmesg | grep -i hpet $ dmesg | grep -i 8254 $ dmesg | grep -i timer ..TIMER: vector=0x31 pin1=2 pin2=-1 PCI: Setting latency timer of device :02:01.0 to 64 $ Assuming it does, is there any reason to use or not to use this feature? Thanks, Justin. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Kernel 2.6: P4 SMT Question.
General Question: Would a desktop or server benefit more from SMT? For a Pentium 4 w/HT, we use SMP. What are the advantages/disadvantages of using SMT in the kernel? Thanks, Justin. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
XFS Oops Under 2.6.12.2
After a couple hours of use, I get this error on a linear RAID under 2.6.12.2 using loop-AES w/AES-256 encrypted filesystem. Anyone know what is wrong? Filesystem "loop1": XFS internal error xfs_da_do_buf(2) at line 2271 of file fs/xfs/xfs_da_btree.c. Caller 0xc025e807 [] xfs_da_do_buf+0x500/0x860 [] xfs_da_read_buf+0x57/0x60 [] xfs_da_read_buf+0x57/0x60 [] __tcp_data_snd_check+0xcb/0xe0 [] tcp_new_space+0x8d/0xa0 [] tcp_v4_rcv+0x585/0x810 [] xfs_da_read_buf+0x57/0x60 [] xfs_dir2_block_getdents+0xa4/0x330 [] xfs_dir2_block_getdents+0xa4/0x330 [] ip_local_deliver_finish+0x0/0x150 [] ip_rcv+0x391/0x510 [] xfs_bmap_last_offset+0xc2/0x120 [] ip_rcv_finish+0x0/0x290 [] xfs_dir2_put_dirent64_direct+0x0/0xc0 [] xfs_dir2_isblock+0x32/0x90 [] xfs_dir2_put_dirent64_direct+0x0/0xc0 [] xfs_dir2_getdents+0xa1/0x150 [] xfs_dir2_put_dirent64_direct+0x0/0xc0 [] xfs_readdir+0x75/0xc0 [] linvfs_readdir+0x10e/0x270 [] net_rx_action+0x6a/0xf0 [] vfs_readdir+0x77/0x90 [] filldir64+0x0/0x110 [] sys_getdents64+0x6f/0xb2 [] filldir64+0x0/0x110 [] syscall_call+0x7/0xb # lsmod Module Size Used by I am not using any special modules. Configuration file attached. config-2.6.12.2.txt.bz2 Description: Binary data
Kernel/Box Freezes Under Kernel 2.6.12.5
Kernel 2.6.12.5: 1- 400GB Seagate 8MB cache, 7200RPM, ATA/100 drive. 2- ATA/133 Maxtor (ATA/Promise Controller) 1) Attached 400GB to Seagate 400GB drive. 2) (Not mounted yet) 3) See below hde: 781422768 sectors (400088 MB) w/8192KiB Cache, CHS=48641/255/63, UDMA(100) 4) Partition with fdisk (hde1). 5) mkfs.xfs /dev/hde1 *** KERNEL FREEZE *** (ENTIRE MACHINE LOCKS UP) Do the SAME EXACT THING on the motherboard (INTEL) controller or an ATA/100 Promise Controller, there are NO problems. Either people with this problem are *not* reporting it or do not know where to report the problem to. This is the second machine I have seen this problem with. Has anyone looked into this? Justin. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel/Box Freezes Under Kernel 2.6.12.5
I have three different Maxtor (promise) ATA/133 controllers, it happens with all three. On Fri, 26 Aug 2005, Patrick McFarland wrote: On Friday 26 August 2005 05:36 pm, Justin Piszcz wrote: 2- ATA/133 Maxtor (ATA/Promise Controller) Make sure its actually the kernel and not that controller. Go find another identical one and test with it. -- Patrick "Diablo-D3" McFarland || [EMAIL PROTECTED] "Computer games don't affect kids; I mean if Pac-Man affected us as kids, we'd all be running around in darkened rooms, munching magic pills and listening to repetitive electronic music." -- Kristian Wilson, Nintendo, Inc, 1989 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Promise ATA/133 Errors With 2.6.10+
It appears that 2.6.13-rc7 has fixed the bug. I would like to know *What* changed, but I'll probably never find out :( On Thu, 28 Jul 2005, Andrew Morton wrote: Justin Piszcz <[EMAIL PROTECTED]> wrote: I have two different machines with the 7200.8 Seagate 8MB 400GB drives. Both have ATA/133 controllers, the error is the same on both: Jun 24 15:24:18 localhost kernel: hde: no DRQ after issuing MULTWRITE_EXT I put the drive on an (older) Promise ATA/100 controller = works great! I put the drive on the second box on the motherboard IDE interface = works great! What happened > 2.6.10 to the promise driver? ?? Jun 24 15:24:18 localhost kernel: PDC202XX: Primary channel reset. Jun 24 15:24:18 localhost kernel: hde: timeout waiting for DMA Jun 24 15:24:18 localhost kernel: hde: status error: status=0x58 { DriveReady SeekComplete DataRequest } Jun 24 15:24:18 localhost kernel: Jun 24 15:24:18 localhost kernel: ide: failed opcode was: unknown Jun 24 15:24:18 localhost kernel: hde: drive not ready for command Jun 24 15:24:18 localhost kernel: hde: status timeout: status=0xd0 { Busy } Jun 24 15:24:18 localhost kernel: Jun 24 15:24:18 localhost kernel: ide: failed opcode was: unknown Jun 24 15:24:18 localhost kernel: PDC202XX: Primary channel reset. Jun 24 15:24:18 localhost kernel: hde: no DRQ after issuing MULTWRITE_EXT Jun 24 15:24:18 localhost kernel: ide2: reset: success Is this still happening in 2.6.13-rc4? If so, can you please cc linux-kernel on the reply? Thanks. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Kernel 2.6.13-rc7 Latency Question
These options are self-explanatory: x x ( ) No Forced Preemption (Server) x xx x ( ) Voluntary Kernel Preemption (Desktop) x xx x (X) Preemptible Kernel (Low-Latency Desktop) x x It says 100 HZ or 250 HZ is good for SMP systems; however, what if I am using a P4 machine with 1 CPU (HT), is 1000HZ still the way to go, as its not really 2 *REAL* cpus? lk x x x ( ) 100 HZ x x x ( ) 250 HZ x x x (X) 1000 HZ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel/Box Freezes Under Kernel 2.6.12.5
Yes, I have two separate machines with the same controller and HDD. As soon as I found out it fixed the bug on one of them, I changed it on the other, neither machine has crashed since. On Fri, 26 Aug 2005, Patrick McFarland wrote: On Friday 26 August 2005 05:36 pm, Justin Piszcz wrote: 2- ATA/133 Maxtor (ATA/Promise Controller) Make sure its actually the kernel and not that controller. Go find another identical one and test with it. -- Patrick "Diablo-D3" McFarland || [EMAIL PROTECTED] "Computer games don't affect kids; I mean if Pac-Man affected us as kids, we'd all be running around in darkened rooms, munching magic pills and listening to repetitive electronic music." -- Kristian Wilson, Nintendo, Inc, 1989 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Linux Kernel 2.6.13-rc7 (WORKS) (2.6.13, DRQ/System CRASH)
All, I am trying to get everyone together on this to hopefully solve a serious bug that I have seen on multiple machines with: a) A Promise ATA/133 controller (ATA/100 works OK) b) Kernel 2.6.12 or 2.6.13 (2.6.13-rc7 appears to be OK) The drive is a Seagate 7200.8 400GB 7200RPM 8MB cache disk. hde: ST3400832A, ATA DISK drive With older kernels, if I *DO NOT ENABLE DMA* it does not crash. If I *ENABLE DMA* then proceed to do anything with the disk, it will FREEZE the box, no oops, etc, *FREEZE*. hdparm -t /dev/hde mkfs.xfs -f /dev/hde1 Will freeze the box. --- Linux Kernel 2.6.13 final experiences the same problems as 2.6.12.5. I have e-mailed the list quite a few times with this issue, I am surprised very few people run into it. Here is the error in the logs: Aug 31 11:30:25 p34 kernel: hde: dma_timer_expiry: dma status == 0x20 Aug 31 11:30:25 p34 kernel: hde: DMA timeout retry Aug 31 11:30:25 p34 kernel: PDC202XX: Primary channel reset. Aug 31 11:30:25 p34 kernel: hde: timeout waiting for DMA Aug 31 11:30:25 p34 kernel: hde: status error: status=0x58 { DriveReady SeekComplete DataRequest } Aug 31 11:30:25 p34 kernel: hde: drive not ready for command Aug 31 11:30:25 p34 kernel: hde: status timeout: status=0xd0 { Busy } Aug 31 11:30:25 p34 kernel: PDC202XX: Primary channel reset. Aug 31 11:30:25 p34 kernel: hde: no DRQ after issuing MULTWRITE_EXT Aug 31 11:30:25 p34 kernel: ide2: reset: success After this, the machine locks up with 2.6.13. With 2.6.13-rc7, I have not seen this once. Can anyone offer any insight to why this is happening? I have a few machines with the ATA/133 controller and 400GB drives; therefore, I'd prefer to fix the problem rather than hooking up older, ATA/100 drives, just so I can run newer kernels... Thanks. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux Kernel 2.6.13-rc7 (WORKS) (2.6.13, DRQ/System CRASH)
I do not even have IDE Taskfile Access enabled, so how is the kernel printing these error messages before it freezes? linux-2.6.13/drivers/ide/ide-taskfile.c:printk(KERN_ERR "%s: no DRQ after issuing %sWRITE%s\n", lqqq ATA/ATAPI/MFM/RLL support qqqk x x[ ] IDE Taskfile Access Anyone have any suggestions how I can solve this problem? On Wed, 31 Aug 2005, Justin Piszcz wrote: All, I am trying to get everyone together on this to hopefully solve a serious bug that I have seen on multiple machines with: a) A Promise ATA/133 controller (ATA/100 works OK) b) Kernel 2.6.12 or 2.6.13 (2.6.13-rc7 appears to be OK) The drive is a Seagate 7200.8 400GB 7200RPM 8MB cache disk. hde: ST3400832A, ATA DISK drive With older kernels, if I *DO NOT ENABLE DMA* it does not crash. If I *ENABLE DMA* then proceed to do anything with the disk, it will FREEZE the box, no oops, etc, *FREEZE*. hdparm -t /dev/hde mkfs.xfs -f /dev/hde1 Will freeze the box. --- Linux Kernel 2.6.13 final experiences the same problems as 2.6.12.5. I have e-mailed the list quite a few times with this issue, I am surprised very few people run into it. Here is the error in the logs: Aug 31 11:30:25 p34 kernel: hde: dma_timer_expiry: dma status == 0x20 Aug 31 11:30:25 p34 kernel: hde: DMA timeout retry Aug 31 11:30:25 p34 kernel: PDC202XX: Primary channel reset. Aug 31 11:30:25 p34 kernel: hde: timeout waiting for DMA Aug 31 11:30:25 p34 kernel: hde: status error: status=0x58 { DriveReady SeekComplete DataRequest } Aug 31 11:30:25 p34 kernel: hde: drive not ready for command Aug 31 11:30:25 p34 kernel: hde: status timeout: status=0xd0 { Busy } Aug 31 11:30:25 p34 kernel: PDC202XX: Primary channel reset. Aug 31 11:30:25 p34 kernel: hde: no DRQ after issuing MULTWRITE_EXT Aug 31 11:30:25 p34 kernel: ide2: reset: success After this, the machine locks up with 2.6.13. With 2.6.13-rc7, I have not seen this once. Can anyone offer any insight to why this is happening? I have a few machines with the ATA/133 controller and 400GB drives; therefore, I'd prefer to fix the problem rather than hooking up older, ATA/100 drives, just so I can run newer kernels... Thanks. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Kernel 2.6.13 + IDE + MULTWRITE_EXT / DRQ Errors
I still get this error when the drive is on a Promise ATA/133 card. I have the same setup in two separate machines, the results are the same with kernel 2.6.13, ideas? Should I just get more ATA/100 cards and stop trying to figure out what the bug is? Keep in mind the Promise ATA/100 cards exhibit no such errors or problems as below. I have not received any responses concerning this bug. According to: http://www.ussg.iu.edu/hypermail/linux/kernel/0310.2/0693.html Disabling PIO multiwrite fixes this problem, how do you do that? As far as I can tell it is not enabled. # hdparm -vi /dev/hde /dev/hde: multcount= 0 (off) IO_support = 0 (default 16-bit) unmaskirq= 0 (off) using_dma= 1 (on) keepsettings = 0 (off) readonly = 0 (off) readahead= 256 (on) geometry = 48641/255/63, sectors = 781422768, start = 0 Model=ST3400832A, FwRev=3.01, SerialNo=3NF04YQK Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% } RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4 BuffType=unknown, BuffSize=8192kB, MaxMultSect=16, MultSect=off CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=268435455 IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120} PIO modes: pio0 pio1 pio2 pio3 pio4 DMA modes: mdma0 mdma1 mdma2 UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5 AdvancedPM=no WriteCache=enabled Drive conforms to: device does not report version: * signifies the current active mode Another person states: http://www.red-hat.com/archives/redhat-list/2000-June/msg00289.html Hi I've been gone for awhile but didn't see a response to this. I solved the problem on one machine by changeing the setting in bios from UDMA AUTO to DISABLED. .I had another machine that periodically gave me this message that just had the hard drive and mother board replaced by the shop that built it because Seagate said that bios had missed identified the drive geometry and had the head sectors ect set wrong. Linda What is the real problem here? Kernel .config attached for 2.6.13. Please let me know, thanks. Sep 2 20:48:05 p34 XFS mounting filesystem hde1 Sep 2 20:48:25 p34 hde: dma_timer_expiry: dma status == 0x20 Sep 2 20:48:25 p34 hde: DMA timeout retry Sep 2 20:48:25 p34 PDC202XX: Primary channel reset. Sep 2 20:48:25 p34 hde: timeout waiting for DMA Sep 2 20:48:25 p34 hde: status error: status=0x58 { Sep 2 20:48:25 p34 DriveReady Sep 2 20:48:25 p34 SeekComplete Sep 2 20:48:25 p34 DataRequest Sep 2 20:48:25 p34 } Sep 2 20:48:25 p34 ide: failed opcode was: Sep 2 20:48:25 p34 unknown Sep 2 20:48:25 p34 hde: drive not ready for command Sep 2 20:48:26 p34 hde: status timeout: status=0xd0 { Sep 2 20:48:26 p34 Busy Sep 2 20:48:26 p34 } Sep 2 20:48:26 p34 ide: failed opcode was: Sep 2 20:48:26 p34 unknown Sep 2 20:48:26 p34 PDC202XX: Primary channel reset. Sep 2 20:48:26 p34 hde: no DRQ after issuing MULTWRITE_EXT Sep 2 20:48:26 p34 ide2: reset: Sep 2 20:48:26 p34 success lspci output: :03:06.0 Unknown mass storage controller: Promise Technology, Inc. 20269 (rev 02) :03:07.0 Unknown mass storage controller: Promise Technology, Inc. 20269 (rev 02) $ cat /proc/interrupts CPU0 CPU1 0: 219088 0IO-APIC-edge timer 1: 9 0IO-APIC-edge i8042 9: 0 0 IO-APIC-level acpi 14: 3201 0IO-APIC-edge ide0 15: 12 0IO-APIC-edge ide1 16: 13893 0 IO-APIC-level eth0, libata, eth2 17:168 0 IO-APIC-level eth1 18:555 0 IO-APIC-level ide2 19:192 0 IO-APIC-level ide4, ide5 NMI: 0 0 LOC: 218910 218906 ERR: 0 MIS: 0 $ gcc -v gcc version 4.0.1 (Debian 4.0.1-2) processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 3 model name : Intel(R) Pentium(R) 4 CPU 3.40GHz stepping: 4 cpu MHz : 3409.857 cache size : 1024 KB physical id : 0 siblings: 2 core id : 0 cpu cores : 1 fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pni monitor ds_cpl cid xtpr bogomips: 6821.59 processor : 1 vendor_id : GenuineIntel cpu family : 15 model : 3 model name : Intel(R) Pentium(R) 4 CPU 3.40GHz stepping: 4 cpu MHz : 3409.857 cache size : 1024 KB physical id : 0 siblings: 2 core id : 0 cpu cores : 1 fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags
2.6.13+netconsole captures crash
On 2.6.13, I have a simple script that tars the data from the root filesystem to a 400GB disk, when this started, I got the following errors and then the machine locked up: Again, 400GB/Seagate+ATA/133, someone should add to the CONFIG_OPTION that 400GB drives are NOT supported w/ the Promise ATA/133 controllers. Sep 2 21:00:55 p34 hde: dma_timer_expiry: dma status == 0x20 Sep 2 21:00:55 p34 hde: DMA timeout retry Sep 2 21:00:55 p34 PDC202XX: Primary channel reset. Sep 2 21:00:55 p34 hde: timeout waiting for DMA Sep 2 21:00:55 p34 hde: status error: status=0x58 { Sep 2 21:00:55 p34 DriveReady Sep 2 21:00:55 p34 SeekComplete Sep 2 21:00:55 p34 DataRequest Sep 2 21:00:55 p34 } Sep 2 21:00:55 p34 ide: failed opcode was: Sep 2 21:00:55 p34 unknown Sep 2 21:00:55 p34 hde: drive not ready for command Sep 2 21:00:55 p34 hde: status error: status=0x50 { Sep 2 21:00:55 p34 DriveReady Sep 2 21:00:55 p34 SeekComplete Sep 2 21:00:55 p34 } Sep 2 21:00:55 p34 ide: failed opcode was: Sep 2 21:00:55 p34 unknown Sep 2 21:00:55 p34 hde: no DRQ after issuing MULTWRITE_EXT - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Kernel 2.6.13 repeated ACPI events?
I have a box where I keep getting this in dmesg: ACPI: PCI Interrupt :01:00.0[A] -> Link [LNKD] -> GSI 5 (level, low) -> IRQ 5 ACPI: PCI Interrupt :01:00.0[A] -> Link [LNKD] -> GSI 5 (level, low) -> IRQ 5 ACPI: PCI Interrupt :01:00.0[A] -> Link [LNKD] -> GSI 5 (level, low) -> IRQ 5 ACPI: PCI Interrupt :01:00.0[A] -> Link [LNKD] -> GSI 5 (level, low) -> IRQ 5 ACPI: PCI Interrupt :01:00.0[A] -> Link [LNKD] -> GSI 5 (level, low) -> IRQ 5 ACPI: PCI Interrupt :01:00.0[A] -> Link [LNKD] -> GSI 5 (level, low) -> IRQ 5 ACPI: PCI Interrupt :01:00.0[A] -> Link [LNKD] -> GSI 5 (level, low) -> IRQ 5 ACPI: PCI Interrupt :01:00.0[A] -> Link [LNKD] -> GSI 5 (level, low) -> IRQ 5 # cat /proc/interrupts CPU0 0:2691916 XT-PIC timer 1:392 XT-PIC i8042 2: 0 XT-PIC cascade 5:1120689 XT-PIC eth1, eth0 9: 1 XT-PIC acpi 14: 3938 XT-PIC ide0 15: 45 XT-PIC ide1 NMI: 0 LOC: 0 ERR: 0 MIS: 0 Anyone have any idea what could cause this? # lspci 00:00.0 Host bridge: Intel Corp. 82815 815 Chipset Host Bridge and Memory Controller Hub (rev 02) 00:01.0 PCI bridge: Intel Corp. 82815 815 Chipset AGP Bridge (rev 02) 00:1e.0 PCI bridge: Intel Corp. 82801AA PCI Bridge (rev 02) 00:1f.0 ISA bridge: Intel Corp. 82801AA ISA Bridge (LPC) (rev 02) 00:1f.1 IDE interface: Intel Corp. 82801AA IDE (rev 02) 00:1f.2 USB Controller: Intel Corp. 82801AA USB (rev 02) 00:1f.3 SMBus: Intel Corp. 82801AA SMBus (rev 02) 01:00.0 Ethernet controller: 3Com Corporation 3c905B 100BaseTX [Cyclone] 01:04.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 78) 02:00.0 VGA compatible controller: nVidia Corporation NV5M64 [RIVA TNT2 Model 64/Model 64 Pro] (rev 15) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux Kernel 2.6.13-rc7 (WORKS) (2.6.13, DRQ/System CRASH)
Also, Part of the problem may be that I have two ATA/133 Promise cards in one box and only one ATA/133 in the other box. Kernel 2.6.13 has fixed the problem with one ATA/133 card in the box. Kernel 2.6.13 has not fixed the problem with two ATA/133 cards in the box. FYI Justin. On Mon, 5 Sep 2005, Alan Cox wrote: On Llu, 2005-09-05 at 09:26 +0200, Bartlomiej Zolnierkiewicz wrote: After DMA timeout driver reverted back to PIO, ide-taskfile.c also holds PIO code besides IDE Taskfile Access. On SMP after a DMA timeout it will potentially freeze. There are some paths in that code which lead to double lock takes and hangs, plus some timer races. Justin can you make a backup (I mean that seriously), then build a kernel with spin lock debug enabled and see if you can reproduce the problem and get a trace. If its the locking you'll get a trace and the kernel will continue. At that point because the spinlock debug continues unsafely through a double lock after the trace you are in the "danger zone" hence the backup warning [Yes the spin lock debug code really should warn you its dangerous for non debug uses or get patched as it is in Fedora to trace and stop] If its a hardware or other problem it will still hang if its an unrelated lock problem it should still get a trace. Why you see this only on 2.6.13 not 2.6.13-rc7 I don't know. It makes me wonder if you have a bad drive - but then you imply going back to rc7 goes back to stable. Can you therefore also check the .config options between the two kernels match. Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel 2.6.23.9 / P35 Chipset + WD 750GB Drives (reset port)
On Thu, 6 Dec 2007, Andrew Morton wrote: On Sat, 1 Dec 2007 06:26:08 -0500 (EST) Justin Piszcz <[EMAIL PROTECTED]> wrote: I am putting a new machine together and I have dual raptor raid 1 for the root, which works just fine under all stress tests. Then I have the WD 750 GiB drive (not RE2, desktop ones for ~150-160 on sale now adays): I ran the following: dd if=/dev/zero of=/dev/sdc dd if=/dev/zero of=/dev/sdd dd if=/dev/zero of=/dev/sde (as it is always a very good idea to do this with any new disk) And sometime along the way(?) (i had gone to sleep and let it run), this occurred: [42880.680144] ata3.00: exception Emask 0x10 SAct 0x0 SErr 0x401 action 0x2 frozen Gee we're seeing a lot of these lately. [42880.680231] ata3.00: irq_stat 0x00400040, connection status changed [42880.680290] ata3.00: cmd ec/00:00:00:00:00/00:00:00:00:00/00 tag 0 cdb 0x0 data 512 in [42880.680292] res 40/00:ac:d8:64:54/00:00:57:00:00/40 Emask 0x10 (ATA bus error) [42881.841899] ata3: soft resetting port [42885.966320] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [42915.919042] ata3.00: qc timeout (cmd 0xec) [42915.919094] ata3.00: failed to IDENTIFY (I/O error, err_mask=0x5) [42915.919149] ata3.00: revalidation failed (errno=-5) [42915.919206] ata3: failed to recover some devices, retrying in 5 secs [42920.912458] ata3: hard resetting port [42926.411363] ata3: port is slow to respond, please be patient (Status 0x80) [42930.943080] ata3: COMRESET failed (errno=-16) [42930.943130] ata3: hard resetting port [42931.399628] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [42931.413523] ata3.00: configured for UDMA/133 [42931.413586] ata3: EH pending after completion, repeating EH (cnt=4) [42931.413655] ata3: EH complete [42931.413719] sd 2:0:0:0: [sdc] 1465149168 512-byte hardware sectors (750156 MB) [42931.413809] sd 2:0:0:0: [sdc] Write Protect is off [42931.413856] sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00 [42931.413867] sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Usually when I see this sort of thing with another box I have full of raptors, it was due to a bad raptor and I never saw it again after I replaced the disk that it happened on, but that was using the Intel P965 chipset. For this board, it is a Gigabyte GSP-P35-DS4 (Rev 2.0) and I have all of the drives (2 raptors, 3 750s connected to the Intel ICH9 Southbridge). I am going to do some further testing but does this indicate a bad drive? Bad cable? Bad connector? As you can see above, /dev/sdc stopped responding for a little bit and then the kernel reset the port. Why is this though? What is the likely root cause? Should I replace the drive? Obviously this is not normal and cannot be good at all, the idea is to put these drives in a RAID5 and if one is going to timeout that is going to cause the array to go degraded and thus be worthless in a raid5 configuration. Can anyone offer any insight here? It would be interesting to try 2.6.21 or 2.6.22. This was due to NCQ issues (disabling it fixed the problem). Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23.9: x86_64: floppy not working: p35 chipset
On Fri, 7 Dec 2007, Bill Davidsen wrote: Justin Piszcz wrote: Trying to format a floppy (2-3 of them) on a GA-P35-DS4 2.0 with a regular Sony floppy on Debian x86_64 with kernel 2.6.23.9: # fdformat /dev/fd0 Could not determine current format type: No such device # mformat a: mformat: Could not get geometry of device (No such device) # # cat /proc/interrupts |grep floppy 6: 38 37 39 41 IO-APIC-edge floppy # dmesg|grep -A1 fd0 [ 52.689487] Floppy drive(s): fd0 is 1.44M [ 52.704661] FDC 0 is a post-1991 82077 During the 'attempted format' I've tried a few different floppies, the result is the same. The system is 64-bit only, no 32-bit emulation is enabled using a strict 64-bit-only userland. Has anyone else gotten their floppy drive to work under 64-bit? Is this just a case of a DOA floppy drive or is something else wrong? Maybe booting from a 32 bit live CD would help determine that. It certainly was seen at boot time. Didn't get hooked to some SCSI device name by udev, did it? -- Bill Davidsen <[EMAIL PROTECTED]> "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot Retried with some other floppies and later tried the original, everything seems to be working now, must have been a bad floppy/some transient issue. Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.23.9 64-bit IRQ sharing without irqbalance with p35 vs i965?
Can some please explain with almost identical kernel .config's I see this on a p965 Intel board: CPU0 CPU1 CPU2 CPU3 0: 2501669875 0 0 0 IO-APIC-edge timer 1: 8 0 0 0 IO-APIC-edge i8042 7: 0 0 0 0 IO-APIC-edge parport0 8: 1 0 0 0 IO-APIC-edge rtc 9: 0 0 0 0 IO-APIC-fasteoi acpi 12: 21187797 0 0 0 IO-APIC-edge i8042 16: 29096123 0 0 0 IO-APIC-fasteoi sata_sil24, uhci_hcd:usb3 17:122 0 0 0 IO-APIC-fasteoi libata 18: 27177188 0 0 0 IO-APIC-fasteoi sata_sil24, ehci_hcd:usb1, uhci_hcd:usb7 19: 29230654 0 0 0 IO-APIC-fasteoi sata_sil24, ohci1394, uhci_hcd:usb6 21: 41626956 0 0 0 IO-APIC-fasteoi uhci_hcd:usb4, eth1 22:203 0 0 0 IO-APIC-fasteoi HDA Intel 23:108 0 0 0 IO-APIC-fasteoi ehci_hcd:usb2, uhci_hcd:usb5 377: 590862399 0 0 0 PCI-MSI-edge eth0 378: 89967666 0 0 0 PCI-MSI-edge ahci NMI: 0 0 0 0 LOC: 2501568889 2501566748 2501569508 2501566435 ERR: 0 And this on a Gigabyte p35 chipset: CPU0 CPU1 CPU2 CPU3 0:4798363480624248071034801203 IO-APIC-edge timer 1: 1 2 3 2 IO-APIC-edge i8042 6: 3 0 0 0 IO-APIC-edge floppy 8: 0 0 0 1 IO-APIC-edge rtc 9: 0 0 0 0 IO-APIC-fasteoi acpi 12: 22 32 26 33 IO-APIC-edge i8042 16: 0 0 0 0 IO-APIC-fasteoi ahci, uhci_hcd:usb3 17: 16994 17065 16955 16952 IO-APIC-fasteoi libata, eth0 18: 0 1 1 1 IO-APIC-fasteoi ohci1394, ehci_hcd:usb1, uhci_hcd:usb5, uhci_hcd:usb8 19: 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb7 21: 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb4 22: 38 39 40 38 IO-APIC-fasteoi HDA Intel 23: 0 0 0 0 IO-APIC-fasteoi ehci_hcd:usb2, uhci_hcd:usb6 380:6838702682341168227046836873 PCI-MSI-edge ahci NMI: 0 0 0 0 LOC: 18993695 18993662 18891135 18891030 ERR: 0 Both are running Debian Lenny (testing) with (almost the same exact kernel configuration). Why is this? Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86_32: trim memory by updating e820 v2
On Mon, 21 Jan 2008, Jesse Barnes wrote: On Sunday, January 20, 2008 10:56 pm Yinghai Lu wrote: [PATCH] x86_32: trim memory by updating e820 v2 when mtrr is not covering all e820 table, need to trim the ram, need to update e820 reuse some code for x86_64 here need to add early_identify_cpu for x86_32, and move mtrr_bp_init early compiled test only, need someone test it I like this approach too. So as long as the E820 modification method works (i.e. we have some testers, maybe Justin can give it a try), you can add Signed-off-by: Jesse Barnes <[EMAIL PROTECTED]> or Acked-by: Jesse Barnes <[EMAIL PROTECTED]> as appropriate too. Thanks, Jesse Subject: Re: [PATCH] x86_32: trim memory by updating e820 v2 ^^ I run x86_64 btw-- if there is a kernel.patch for x86_64 please let me know and I can test, thanks. Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86_32: trim memory by updating e820 v2
On Mon, 21 Jan 2008, Yinghai Lu wrote: On Monday 21 January 2008 11:14:02 am Justin Piszcz wrote: please get x86.git git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git cd linux-2.6 #--{ x86.git instructions }--> # Add Linus's tree as a remote git remote add linus git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git # Add Ingo's tree as a remote git remote add x86 git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86.git # With that setup, just run the following to get any changes you # don't have. It will also notice any new branches Ingo/Linus # add to their repo. Look in .git/config afterwards, the format # to add new remotes is easy to figure out. git remote update #- git merge x86/master git merge x86/mm and apply [PATCH] x86_64: check if Tom2 is enabled http://lkml.org/lkml/2008/1/21/20 [PATCH] x86_64: update e820 instead of updating end_pfn v3 http://lkml.org/lkml/2008/1/21/19 [PATCH] x86_32: trim memory by updating e820 v2 http://lkml.org/lkml/2008/1/21/18 YH Thanks, I am all patched up and ready to test, unfortunately one of my disks in my RAID 1 just died, I already filled out the advanced replacement form, I will test when I receive the replacement disk. p34:~# lilo Fatal: Not all RAID-1 disks are active; use '-H' to install to active disks only p34:~# lilo -H Warning: Partial RAID-1 install on active disks only; booting is not failsafe Warning: Faulty disk in RAID-1 array; boot with caution!! Fatal: Unusual RAID bios device code: 0xFF p34:~# Don't feel like mucking up my system at the moment. Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Kernel 2.6.23.9 + mdadm 2.6.2-2 + Auto rebuild RAID1?
Quick question, Setup a new machine last night with two raptor 150 disks. Setup RAID1 as I do everywhere else, 0.90.03 superblocks (in order to be compatible with LILO, if you use 1.x superblocks with LILO you can't boot), and then: /dev/sda1+sdb1 <-> /dev/md0 <-> swap /dev/sda2+sdb2 <-> /dev/md1 <-> /boot (ext3) /dev/sda3+sdb3 <-> /dev/md2 <-> / (xfs) All works fine, no issues... Quick question though, I turned off the machine, disconnected /dev/sda from the machine, boot from /dev/sdb, no problems, shows as degraded RAID1. Turn the machine off. Re-attach the first drive. When I boot my first partition either re-synced by itself or it was not degraded, was is this? So two questions: 1) If it rebuilt by itself, how come it only rebuilt /dev/md0? 2) If it did not rebuild, is it because the kernel knows it does not need to re-calculate parity etc for swap? I had to: mdadm /dev/md1 -a /dev/sda2 and mdadm /dev/md2 -a /dev/sda3 To rebuild the /boot and /, which worked fine, I am just curious though why it works like this, I figured it would be all or nothing. More info: Not using ANY initramfs/initrd images, everything is compiled into 1 kernel image (makes things MUCH simpler and the expected device layout etc is always the same, unlike initrd/etc). Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Kernel 2.6.23.9 / P35 Chipset + WD 750GB Drives (reset port)
I am putting a new machine together and I have dual raptor raid 1 for the root, which works just fine under all stress tests. Then I have the WD 750 GiB drive (not RE2, desktop ones for ~150-160 on sale now adays): I ran the following: dd if=/dev/zero of=/dev/sdc dd if=/dev/zero of=/dev/sdd dd if=/dev/zero of=/dev/sde (as it is always a very good idea to do this with any new disk) And sometime along the way(?) (i had gone to sleep and let it run), this occurred: [42880.680144] ata3.00: exception Emask 0x10 SAct 0x0 SErr 0x401 action 0x2 frozen [42880.680231] ata3.00: irq_stat 0x00400040, connection status changed [42880.680290] ata3.00: cmd ec/00:00:00:00:00/00:00:00:00:00/00 tag 0 cdb 0x0 data 512 in [42880.680292] res 40/00:ac:d8:64:54/00:00:57:00:00/40 Emask 0x10 (ATA bus error) [42881.841899] ata3: soft resetting port [42885.966320] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [42915.919042] ata3.00: qc timeout (cmd 0xec) [42915.919094] ata3.00: failed to IDENTIFY (I/O error, err_mask=0x5) [42915.919149] ata3.00: revalidation failed (errno=-5) [42915.919206] ata3: failed to recover some devices, retrying in 5 secs [42920.912458] ata3: hard resetting port [42926.411363] ata3: port is slow to respond, please be patient (Status 0x80) [42930.943080] ata3: COMRESET failed (errno=-16) [42930.943130] ata3: hard resetting port [42931.399628] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [42931.413523] ata3.00: configured for UDMA/133 [42931.413586] ata3: EH pending after completion, repeating EH (cnt=4) [42931.413655] ata3: EH complete [42931.413719] sd 2:0:0:0: [sdc] 1465149168 512-byte hardware sectors (750156 MB) [42931.413809] sd 2:0:0:0: [sdc] Write Protect is off [42931.413856] sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00 [42931.413867] sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Usually when I see this sort of thing with another box I have full of raptors, it was due to a bad raptor and I never saw it again after I replaced the disk that it happened on, but that was using the Intel P965 chipset. For this board, it is a Gigabyte GSP-P35-DS4 (Rev 2.0) and I have all of the drives (2 raptors, 3 750s connected to the Intel ICH9 Southbridge). I am going to do some further testing but does this indicate a bad drive? Bad cable? Bad connector? As you can see above, /dev/sdc stopped responding for a little bit and then the kernel reset the port. Why is this though? What is the likely root cause? Should I replace the drive? Obviously this is not normal and cannot be good at all, the idea is to put these drives in a RAID5 and if one is going to timeout that is going to cause the array to go degraded and thus be worthless in a raid5 configuration. Can anyone offer any insight here? Thank you, Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel 2.6.23.9 / P35 Chipset + WD 750GB Drives (reset port)
On Sat, 1 Dec 2007, Jan Engelhardt wrote: On Dec 1 2007 06:26, Justin Piszcz wrote: I ran the following: dd if=/dev/zero of=/dev/sdc dd if=/dev/zero of=/dev/sdd dd if=/dev/zero of=/dev/sde (as it is always a very good idea to do this with any new disk) Why would you care about what's on the disk? fdisk, mkfs and the day-to-day operation will overwrite it _anyway_. (If you think the disk is not empty, you should look at it and copy off all usable warez beforehand :-) The purpose is with any new disk its good to write to all the blocks and let the drive to all of the re-mapping before you put 'real' data on it. Let it crap out or fail before I put my data on it. Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel 2.6.23.9 + mdadm 2.6.2-2 + Auto rebuild RAID1?
On Sat, 1 Dec 2007, Jan Engelhardt wrote: On Dec 1 2007 06:19, Justin Piszcz wrote: RAID1, 0.90.03 superblocks (in order to be compatible with LILO, if you use 1.x superblocks with LILO you can't boot) Says who? (Don't use LILO ;-) I like LILO :) , and then: /dev/sda1+sdb1 <-> /dev/md0 <-> swap /dev/sda2+sdb2 <-> /dev/md1 <-> /boot (ext3) /dev/sda3+sdb3 <-> /dev/md2 <-> / (xfs) All works fine, no issues... Quick question though, I turned off the machine, disconnected /dev/sda from the machine, boot from /dev/sdb, no problems, shows as degraded RAID1. Turn the machine off. Re-attach the first drive. When I boot my first partition either re-synced by itself or it was not degraded, was is this? If md0 was not touched (written to) after you disconnected sda, it also should not be in a degraded state. So two questions: 1) If it rebuilt by itself, how come it only rebuilt /dev/md0? So md1/md2 was NOT rebuilt? Correct. 2) If it did not rebuild, is it because the kernel knows it does not need to re-calculate parity etc for swap? Kernel does not know what's inside an md usually. And it should not try to be smart. Ok. I had to: mdadm /dev/md1 -a /dev/sda2 and mdadm /dev/md2 -a /dev/sda3 To rebuild the /boot and /, which worked fine, I am just curious though why it works like this, I figured it would be all or nothing. Devices are not automatically readded. Who knows, maybe you inserted a different disk into sda which you don't want to be overwritten. Makes sense, I just wanted to confirm that it was normal.. More info: Not using ANY initramfs/initrd images, everything is compiled into 1 kernel image (makes things MUCH simpler and the expected device layout etc is always the same, unlike initrd/etc). My expected device layout is also always the same, _with_ initrd. Why? Simply because mdadm.conf is copied to the initrd, and mdadm will use your defined order. That is another way as well, people seem to be divided. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel 2.6.23.9 + mdadm 2.6.2-2 + Auto rebuild RAID1?
On Sat, 1 Dec 2007, Jan Engelhardt wrote: On Dec 1 2007 07:12, Justin Piszcz wrote: On Sat, 1 Dec 2007, Jan Engelhardt wrote: On Dec 1 2007 06:19, Justin Piszcz wrote: RAID1, 0.90.03 superblocks (in order to be compatible with LILO, if you use 1.x superblocks with LILO you can't boot) Says who? (Don't use LILO ;-) I like LILO :) LILO cares much less about disk layout / filesystems than GRUB does, so I would have expected LILO to cope with all sorts of superblocks. OTOH I would suspect GRUB to only handle 0.90 and 1.0, where the MDSB is at the end of the disk <=> the filesystem SB is at the very beginning. So two questions: 1) If it rebuilt by itself, how come it only rebuilt /dev/md0? So md1/md2 was NOT rebuilt? Correct. Well it should, after they are readded using -a. If they still don't, then perhaps another resync is in progress. There was nothing in progress, md0 was synced up and md1,md2 = degraded. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel 2.6.23.9 / P35 Chipset + WD 750GB Drives (reset port)
On Sat, 1 Dec 2007, Justin Piszcz wrote: On Sat, 1 Dec 2007, Janek Kozicki wrote: Justin Piszcz said: (by the date of Sat, 1 Dec 2007 07:23:41 -0500 (EST)) dd if=/dev/zero of=/dev/sdc The purpose is with any new disk its good to write to all the blocks and let the drive to all of the re-mapping before you put 'real' data on it. Let it crap out or fail before I put my data on it. better use badblocks. It writes data, then reads it afterwards: In this example the data is semi random (quicker than /dev/urandom ;) badblocks -c 10240 -s -w -t random -v /dev/sdc -- Janek Kozicki | - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Will give this a shot and see if I can reproduce the error, thanks. The badblocks did not do anything; however, when I built a software raid 5 and the performed a dd: /usr/bin/time dd if=/dev/zero of=fill_disk bs=1M I saw this somewhere along the way: [30189.967531] RAID5 conf printout: [30189.967576] --- rd:3 wd:3 [30189.967617] disk 0, o:1, dev:sdc1 [30189.967660] disk 1, o:1, dev:sdd1 [30189.967716] disk 2, o:1, dev:sde1 [42332.936615] ata5.00: exception Emask 0x2 SAct 0x7000 SErr 0x0 action 0x2 frozen [42332.936706] ata5.00: spurious completions during NCQ issue=0x0 SAct=0x7000 FIS=004040a1:0800 [42332.936804] ata5.00: cmd 61/08:60:6f:4d:2a/00:00:27:00:00/40 tag 12 cdb 0x0 data 4096 out [42332.936805] res 40/00:74:0f:49:2a/00:00:27:00:00/40 Emask 0x2 (HSM violation) [42332.936977] ata5.00: cmd 61/08:68:77:4d:2a/00:00:27:00:00/40 tag 13 cdb 0x0 data 4096 out [42332.936981] res 40/00:74:0f:49:2a/00:00:27:00:00/40 Emask 0x2 (HSM violation) [42332.937162] ata5.00: cmd 61/00:70:0f:49:2a/04:00:27:00:00/40 tag 14 cdb 0x0 data 524288 out [42332.937163] res 40/00:74:0f:49:2a/00:00:27:00:00/40 Emask 0x2 (HSM violation) [42333.240054] ata5: soft resetting port [42333.494462] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [42333.506592] ata5.00: configured for UDMA/133 [42333.506652] ata5: EH complete [42333.506741] sd 4:0:0:0: [sde] 1465149168 512-byte hardware sectors (750156 MB) [42333.506834] sd 4:0:0:0: [sde] Write Protect is off [42333.506887] sd 4:0:0:0: [sde] Mode Sense: 00 3a 00 00 [42333.506905] sd 4:0:0:0: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Next test, I will turn off NCQ and try to make the problem re-occur. If anyone else has any thoughts here..? I ran long smart tests on all 3 disks, they all ran successfully. Perhaps these drives need to be NCQ BLACKLISTED with the P35 chipset? Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.23.9: x86_64: floppy not working: p35 chipset
Trying to format a floppy (2-3 of them) on a GA-P35-DS4 2.0 with a regular Sony floppy on Debian x86_64 with kernel 2.6.23.9: # fdformat /dev/fd0 Could not determine current format type: No such device # mformat a: mformat: Could not get geometry of device (No such device) # # cat /proc/interrupts |grep floppy 6: 38 37 39 41 IO-APIC-edge floppy # dmesg|grep -A1 fd0 [ 52.689487] Floppy drive(s): fd0 is 1.44M [ 52.704661] FDC 0 is a post-1991 82077 During the 'attempted format' [34324.175770] floppy0: probe failed... [34324.575342] floppy0: probe failed... [34324.974733] floppy0: probe failed... [34325.374302] floppy0: probe failed... [34325.773746] floppy0: probe failed... [34326.173312] floppy0: probe failed... [34326.572734] floppy0: probe failed... [34326.972241] floppy0: probe failed... [34327.371739] floppy0: probe failed... [34327.771344] floppy0: probe failed... [34328.170727] floppy0: probe failed... [34328.570283] floppy0: probe failed... [34328.969717] floppy0: probe failed... [34329.369275] floppy0: probe failed... [34329.768691] floppy0: probe failed... [34330.168197] floppy0: probe failed... [34330.168257] end_request: I/O error, dev fd0, sector 0 I've tried a few different floppies, the result is the same. The system is 64-bit only, no 32-bit emulation is enabled using a strict 64-bit-only userland. Has anyone else gotten their floppy drive to work under 64-bit? Is this just a case of a DOA floppy drive or is something else wrong? Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86_32: trim memory by updating e820 v2
On Fri, 25 Jan 2008, Yinghai Lu wrote: On Jan 25, 2008 4:01 PM, Justin Piszcz <[EMAIL PROTECTED]> wrote: ... Tried it, it worked successfully! With stock kernel, previous way I had to use it was mem=8832M and top showed this: top - 18:53:52 up 1 min, 2 users, load average: 1.03, 0.30, 0.10 Tasks: 169 total, 1 running, 168 sleeping, 0 stopped, 0 zombie Cpu(s): 6.1%us, 2.6%sy, 4.5%ni, 81.3%id, 5.5%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 8039464k total, 1288948k used, 6750516k free, 3640k buffers Swap: 16787768k total,0k used, 16787768k free, 178528k cached With kernel you mentioned and use e820 v3: top - 18:48:13 up 3 min, 6 users, load average: 1.67, 0.68, 0.25 Tasks: 195 total, 2 running, 193 sleeping, 0 stopped, 0 zombie Cpu(s): 18.5%us, 1.2%sy, 1.6%ni, 74.8%id, 3.9%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 8037668k total, 1438732k used, 6598936k free, 6844k buffers Swap: 16787768k total,0k used, 16787768k free, 273928k cached No append mem= required. thanks any chance to try 32 bit with higemem64 option? YH My distribution is setup for 64-bit (64bit-clean) only, I do not have a 32-bit userland, so cannot help here, sorry. Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86_32: trim memory by updating e820 v2
On Tue, 22 Jan 2008, Yinghai Lu wrote: On Monday 21 January 2008 01:37:09 pm Justin Piszcz wrote: On Mon, 21 Jan 2008, Yinghai Lu wrote: On Monday 21 January 2008 11:14:02 am Justin Piszcz wrote: please get x86.git git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git cd linux-2.6 #--{ x86.git instructions }--> # Add Linus's tree as a remote git remote add linus git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git # Add Ingo's tree as a remote git remote add x86 git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86.git # With that setup, just run the following to get any changes you # don't have. It will also notice any new branches Ingo/Linus # add to their repo. Look in .git/config afterwards, the format # to add new remotes is easy to figure out. git remote update #- git merge x86/master git merge x86/mm and apply [PATCH] x86_64: check if Tom2 is enabled http://lkml.org/lkml/2008/1/21/20 [PATCH] x86_64: update e820 instead of updating end_pfn v3 http://lkml.org/lkml/2008/1/21/19 [PATCH] x86_32: trim memory by updating e820 v2 http://lkml.org/lkml/2008/1/21/18 YH Thanks, I am all patched up and ready to test, unfortunately one of my disks in my RAID 1 just died, I already filled out the advanced replacement form, I will test when I receive the replacement disk. please get x86.git and apply [PATCH] x86_32: trim memory by updating e820 v3 http://lkml.org/lkml/2008/1/22/394 Ingo already put other two into the tree. Thanks YH Tried it, it worked successfully! With stock kernel, previous way I had to use it was mem=8832M and top showed this: top - 18:53:52 up 1 min, 2 users, load average: 1.03, 0.30, 0.10 Tasks: 169 total, 1 running, 168 sleeping, 0 stopped, 0 zombie Cpu(s): 6.1%us, 2.6%sy, 4.5%ni, 81.3%id, 5.5%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 8039464k total, 1288948k used, 6750516k free, 3640k buffers Swap: 16787768k total,0k used, 16787768k free, 178528k cached With kernel you mentioned and use e820 v3: top - 18:48:13 up 3 min, 6 users, load average: 1.67, 0.68, 0.25 Tasks: 195 total, 2 running, 193 sleeping, 0 stopped, 0 zombie Cpu(s): 18.5%us, 1.2%sy, 1.6%ni, 74.8%id, 3.9%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 8037668k total, 1438732k used, 6598936k free, 6844k buffers Swap: 16787768k total,0k used, 16787768k free, 273928k cached No append mem= required. A full dmesg is attached so you can analyze the e820/MTRR mapping. File: dmesg-e820v3patch.txt.bz2 Justin. dmesg-e820v3patch.txt.bz2 Description: Binary data
Re: [PATCH] x86_32: trim memory by updating e820 v2
On Mon, 28 Jan 2008, Ingo Molnar wrote: * Justin Piszcz <[EMAIL PROTECTED]> wrote: Tried it, it worked successfully! With stock kernel, previous way I had to use it was mem=8832M and top showed this: top - 18:53:52 up 1 min, 2 users, load average: 1.03, 0.30, 0.10 Tasks: 169 total, 1 running, 168 sleeping, 0 stopped, 0 zombie Cpu(s): 6.1%us, 2.6%sy, 4.5%ni, 81.3%id, 5.5%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 8039464k total, 1288948k used, 6750516k free, 3640k buffers Swap: 16787768k total,0k used, 16787768k free, 178528k cached With kernel you mentioned and use e820 v3: top - 18:48:13 up 3 min, 6 users, load average: 1.67, 0.68, 0.25 Tasks: 195 total, 2 running, 193 sleeping, 0 stopped, 0 zombie Cpu(s): 18.5%us, 1.2%sy, 1.6%ni, 74.8%id, 3.9%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 8037668k total, 1438732k used, 6598936k free, 6844k buffers Swap: 16787768k total,0k used, 16787768k free, 273928k cached No append mem= required. A full dmesg is attached so you can analyze the e820/MTRR mapping. thanks for testing it! The code indeed successfully trimmed your memory map by 64MB: from: [0.00] BIOS-e820: 0001 - 00022c00 (usable) to: [0.00] modified: 0001 - 00022800 (usable) [0.00] modified: 00022800 - 00022c00 (reserved) what happened on your box previously when you booted without any trimming - did it sometimes slow down or something like that? Ingo When I boot the box without any trimming it acts like a 286 or 386, takes about 10 minutes to boot (using raptor disks). Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Reading Bad DVD Under 2.6.10 freezes the box.
I have a DVD where I have three files on it, (1.7gb,1.7gb,900mb). On W2K, when I try to copy the second file, I get a BadCRC error message. Under Linux, I copy up to about 860MB (watched via pipebench) and then it freezes the machine, I cannot ping or get to it or do anything on the console; instead, I am forced to hard reboot. Main Question >> Why does Linux 'freeze up' when W2K gives a BadCRC error msg (never freezes)? The DVD FS is Joilet+ISO (hence, why none of the files are bigger than 2GB), is this normal? Or is there no checking code when there are errors on DVD's to kill the read/etc so it does not freeze the box? Thanks. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Re: Reading Bad DVD Under 2.6.10 freezes the box.
Yeah, I can try 2.4.29 later tonight; also, the DVD is not scratched, just formatted with Joilet/ISO instead of UDF (which is what should be used on DVDs). However, dd if=/dev/hdh of=file.img Even with bs=1 for 1 byte at a time, there seems to be no way to get the data off, however... With the dd, last time I tried it, it just fails. When I use cp to try and copy the file, it freezes the machine. This is all under 2.6.10 with a Toshiba 16X DVD-ROM (I can get model number later.) On Mon, 7 Feb 2005, Xavier Bestel wrote: Le lundi 07 fÃvrier 2005 Ã 08:05 -0500, linux-os a Ãcrit : Main Question >> Why does Linux 'freeze up' when W2K gives a BadCRC error msg (never freezes)? Of course it should not. However, there were many incomplete changes made in 2.6.nn and some may involve problems with locking, etc. I don't remember a version of the kernel gracefully handling scratched CD/DVD. Xav
Question regarding e1000 driver and dropped packets (2.6.5 / 2.6.10)?
I have two identical machines [mobo/hardware wise]: Each machine is a Dell GX1p (500MHZ). I have two Intel Gigabit NICs, one in each box, hooked up to a GigE switch. Ethernet controller: Intel Corp. 82541GI/PI Gigabit Ethernet Controller Ethernet controller: Intel Corp. 82541GI/PI Gigabit Ethernet Controller I doubt its the kernel version; does anyone have any suggestions/ideas why one machine has virtually NO overruns/errors/drops and the other has tons? Also, (I doubt this to be the case but I'll ask anyway) - Is the way the NIC's are setup in the box next to other cards / alter their PCI/IRQ routing which would effect error/drop rates? IE: PCI1 - promise card / pata PCI2 - promise card / pata PCI3 - promise card / sata PCI4 - e1000 nic PCI5 - 4 port nic Would it make sense to order them in a different direction? Also, is there a correlation between errors on the NIC and ERR in /proc/interrupts? Secondly, could loading lm-sensors/temperature modules be causing these problems? dmesg from box2 below: e1000: eth0: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex eth1: Setting full-duplex based on MII#1 link partner capability of 45e1. eth2: Setting full-duplex based on MII#1 link partner capability of 45e1. nfs warning: mount version older than kernel nfs warning: mount version older than kernel nfs warning: mount version older than kernel nfs warning: mount version older than kernel i2c /dev entries driver piix4_smbus :00:07.3: Found :00:07.3 device piix4_smbus :00:07.3: WARNING: SMBus interface has been FORCEFULLY ENABLED! mtrr: no MTRR for fd00,80 found spurious 8259A interrupt: IRQ7. spurious 8259A interrupt: IRQ15. I am currently out of ideas, if anyone can suggest anything, I'd be most greatful, thanks! On the first box, there are hardly any problems receiving packets: Note the errors & dropped on the receiving end: BOX1: (2.6.5) eth0 Link encap:Ethernet HWaddr 00:0E:0C:00:CD:B1 inet addr:10.0.2.254 Bcast:10.0.2.255 Mask:255.255.255.0 inet6 addr: fe80::20e:cff:fe00:cdb1/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:196787934 errors:4 dropped:0 overruns:0 frame:2 TX packets:101356779 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:2602045376 (2481.5 Mb) TX bytes:4051930608 (3864.2 Mb) Base address:0xcc80 Memory:ff02-ff04 BOX1 MODULES: $ lsmod Module Size Used by ip_nat_ftp 4016 0 ip_conntrack_ftp 71088 1 ip_nat_ftp BOX2: (2.6.10) On another box (same physical HW) I get this: eth0 Link encap:Ethernet HWaddr 00:0E:0C:00:D2:06 inet addr:10.0.2.253 Bcast:10.0.2.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 --> RX packets:446380046 errors:1276833 dropped:1276833 overruns:1276833 frame:0 TX packets:572550636 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:2351750726 (2.1 GiB) TX bytes:3659840330 (3.4 GiB) Base address:0xd8c0 Memory:f8fa-f8fc BOX2 MODULES: $ lsmod Module Size Used by ip_nat_irc 3408 0 ip_conntrack_irc 70480 1 ip_nat_irc ip_nat_ftp 4112 0 ip_conntrack_ftp 71344 1 ip_nat_ftp adm102111060 0 i2c_piix4 6000 0 i2c_sensor 2784 1 adm1021 i2c_dev 7680 0 i2c_core 18224 4 adm1021,i2c_piix4,i2c_sensor,i2c_dev I have tried using different cable and ports on the switch, the result is the same. $ tar cvf /box2/4gb_of_stuff.tar 4gb_of_stuff # then the numbers rise rapidly After copying only 1-2GB on BOX2, this is what I get: eth0 Link encap:Ethernet HWaddr 00:0E:0C:00:D2:06 inet addr:10.0.2.253 Bcast:10.0.2.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:1038733 errors:1459 dropped:1459 overruns:1459 frame:0 TX packets:560952 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1491121900 (1.3 GiB) TX bytes:763420385 (728.0 MiB) Base address:0xd8c0 Memory:f8fa-f8fc The only thing that is different is one has more HDD's and an extra PCI controller or so: BOX1 LSPCI: 00:00.0 Host bridge: Intel Corp. 440BX/ZX/DX - 82443BX/ZX/DX Host bridge (rev 03) 00:01.0 PCI bridge: Intel Corp. 440BX/ZX/DX - 82443BX/ZX/DX AGP bridge (rev 03) 00:07.0 ISA bridge: Intel Corp. 82371AB/EB/MB PIIX4 ISA (rev 02) 00:07.1 IDE interface: Intel Corp. 82371AB/EB/MB PIIX4 IDE (rev 01) 00:07.2 USB Controller: Intel Corp. 82371AB/EB/MB PIIX4 USB (rev 01) 00:07.3 Bridge: Intel Corp. 82371AB/EB/MB PIIX4 ACPI (rev 02) 00:0d.0 Ethernet controller: Intel Corp. 82541GI/PI Gigabit Ethernet Controller 00:0e.0 Unknown mass storage controller: Promise Technology, Inc. 20268 (rev 02) 00:0f.0 PCI
Re: Question regarding e1000 driver and dropped packets (2.6.5 / 2.6.10)?
As far as the temp stuff, it does support i2c over the smbus. $ sensors max1617-i2c-0-1a Adapter: SMBus PIIX4 adapter at 0850 Board: +48C (low = -55C, high = +127C) CPU: +49C (low = -55C, high = +110C) Whether its recommended or not, not sure, I'll try later today w/out it. On Tue, 8 Feb 2005, Bukie Mabayoje wrote: Bukie Mabayoje wrote: Can you do a simple test? Connect the two box to the same switch. ( No other box should be on the physical bus) 1. Send packets from BoxA ---> BoxB ( Record the stats) 2. Send packets from BoxB ---> BoxA(Record the stats) 3. Send packets simultaneously from BoxB->BoxA and BoxA -> BoxB (Record the stats) if you can find a third box 4. Send packets [BoxA and BoxC] -> BoxB and BoxB -> BoxA (Record the stats) 5. Send packets [BoxB and BoxC] -> BoxA and BoxA --> BoxB (Record the stats) I don't understand why you received more packet on BoxB. A controlled test will help clarify any ambiguity. [BoxA] RX packets:196787934 errors:4 dropped:0 overruns:0 frame:2 TX packets:101356779 errors:0 dropped:0 overruns:0 carrier:0 [BoxB]RX packets:446380046 errors:1276833 dropped:1276833 overruns:1276833 frame:0 TX packets:572550636 errors:0 dropped:0 overruns:0 carrier:0 Justin Piszcz wrote: I have two identical machines [mobo/hardware wise]: Each machine is a Dell GX1p (500MHZ). I have two Intel Gigabit NICs, one in each box, hooked up to a GigE switch. Ethernet controller: Intel Corp. 82541GI/PI Gigabit Ethernet Controller Ethernet controller: Intel Corp. 82541GI/PI Gigabit Ethernet Controller I doubt its the kernel version; does anyone have any suggestions/ideas why one machine has virtually NO overruns/errors/drops and the other has tons? Also, (I doubt this to be the case but I'll ask anyway) - Is the way the NIC's are setup in the box next to other cards / alter their PCI/IRQ routing which would effect error/drop rates? IE: PCI1 - promise card / pata PCI2 - promise card / pata PCI3 - promise card / sata PCI4 - e1000 nic PCI5 - 4 port nic What matters is which INT# [A,B,C,D] line and/or combination the PCI slot 1, 2, 3, 4 is using. You can find out by running lspci -vv If they are routed to the same system interrupt and lastly, the interrupt priority issues. Would it make sense to order them in a different direction? May not help in identifying the problem. Also, is there a correlation between errors on the NIC and ERR in /proc/interrupts? Maybe.. Secondly, could loading lm-sensors/temperature modules be causing these problems? You don't have any overrun on this box. My Error. It may be related. Try without loading ln-sensor/temp modules. I don't think your mother board supports the i2c stuff you are loading. You have the Intel 440BX AGP chipset and there is not i2c interface on it. dmesg from box2 below: e1000: eth0: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex eth1: Setting full-duplex based on MII#1 link partner capability of 45e1. eth2: Setting full-duplex based on MII#1 link partner capability of 45e1. nfs warning: mount version older than kernel nfs warning: mount version older than kernel nfs warning: mount version older than kernel nfs warning: mount version older than kernel i2c /dev entries driver piix4_smbus :00:07.3: Found :00:07.3 device piix4_smbus :00:07.3: WARNING: SMBus interface has been FORCEFULLY ENABLED! mtrr: no MTRR for fd00,80 found spurious 8259A interrupt: IRQ7. spurious 8259A interrupt: IRQ15. I am currently out of ideas, if anyone can suggest anything, I'd be most greatful, thanks! On the first box, there are hardly any problems receiving packets: Note the errors & dropped on the receiving end: BOX1: (2.6.5) eth0 Link encap:Ethernet HWaddr 00:0E:0C:00:CD:B1 inet addr:10.0.2.254 Bcast:10.0.2.255 Mask:255.255.255.0 inet6 addr: fe80::20e:cff:fe00:cdb1/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:196787934 errors:4 dropped:0 overruns:0 frame:2 TX packets:101356779 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:2602045376 (2481.5 Mb) TX bytes:4051930608 (3864.2 Mb) Base address:0xcc80 Memory:ff02-ff04 BOX1 MODULES: $ lsmod Module Size Used by ip_nat_ftp 4016 0 ip_conntrack_ftp 71088 1 ip_nat_ftp BOX2: (2.6.10) On another box (same physical HW) I get this: eth0 Link encap:Ethernet HWaddr 00:0E:0C:00:D2:06 inet addr:10.0.2.253 Bcast:10.0.2.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 --> RX packets:446380046 errors:1276833 dropped:1276833 overruns:1276833 frame:0 TX packets:572550636 errors:0 dropped:0 overruns:0 carrier:0 collisi
Intel Gigabit NIC (2.6.5 -> 2.6.10) Bug(?) Found
What is this e-mail about? Something in the kernel changed regarding the Intel e1000 driver from 2.6.5 to 2.6.10. The change resulted in thousands of errors when the NIC is receiving data. For the past two weeks I have thought about this and tried everything I could think of, it had really been pestering me. Normally, I never really looked at my ifconfig eth0, eth1 etc because I looked at it a long time ago and noticed it was just fine, this was with earlier kernels. I guess I should check my NIC statistics more often. I have tried the following to figure out why I get so many dropped packets and errors on an interface: 1] New Intel [same model] NIC. 2] Different ports in the switch. 3] New cable. 4] Switched PCI slots for the Intel Gigabit Card. 5] Switched BIOS settings/parameters to exact settings as other, identical machine. None of these fixed the problem. There are two machines (same model) here with GigE nics, on one there are very few (1-3) if any errors on the nic ever. The test that I used that reproduces the problem the quickest is dd if=/dev/zero of=/nfsv3/udp/file.img where the dd is on another box sending to the box that gets the RX errors on the NIC. Generally, there would be about 100 errors every 10 seconds. There are two identical machines on the network here, both with this same Intel Gigabit NIC (82541GI/PI). So one machine is running 2.6.5, the other 2.6.10, I figured it had to something in the kernel that was causing this. Therefore I grabbed ethtool and installed it and did a basic query for network setting parameters, immediately I noticed a difference, which is shown below: * Box with no problems. # ethtool -a eth0 Pause parameters for eth0: Autonegotiate: on RX: on TX: on * Box with NIC that generates errors, dropped packets and overrun errors. # ethtool -a eth0 Pause parameters for eth0: Autonegotiate: on RX: off TX: off According to the manpage: -A change the pause parameters of the specified ethernet device. rx on|off Specify if RX pause is enabled. tx on|off Specify if TX pause is enabled. # ethtool -A eth0 rx on # ethtool -A eth0 tx on My machine now: # ethtool -a eth0 Pause parameters for eth0: Autonegotiate: on RX: on TX: on Then, I re-run the dd command mentioned earlier and let it run for about ten minutes, long and behold not a single dropped packet, overrun or frame error reported! RX packets:6157606 errors:0 dropped:0 overruns:0 frame:0 Previously, this is what I would get after only a minute of running that dd command (I also get the errors copying files etc, dd command just speeds things up): RX packets:6374096 errors:1419 dropped:1419 overruns:1419 frame:0 Afterwards, I no longer have any errors: To the Intel/Kernel guys: Question, these are identical machines for the most part, even the same nics are used in each box, why in 2.6.5 are the settings set differently than that in 2.6.10? I do not believe that it is a distribution specific error as I did not even have ethtool installed before I checked this nor do I see it any boot scripts? For now, I will just have it set the proper settings -A tx on and -A rx on but is there another way to do this or did it change in the kernel at some point? Further investigation reveals on my main machine with an onboard Intel/PRO 1000 built-in NIC which runs on the CSA bus (A-Bit IC7-G) the pause feature is also off; HOWEVER, (2.6GHZ w/HT) this machine does not exhibit any errors! RX packets:2471666 errors:0 dropped:0 overruns:0 frame:0 TX packets:56413066 errors:0 dropped:0 overruns:0 carrier:0 Is it a bug that it defaults to off in the newer kernel versions, as it causes MASSIVE errors on the RX side of the fence? Or should people who run gigabit interfaces on slower machines just add the ethool commands to their startup scripts to avoid the errors/etc? There may be some parallel between speed_OF_CPU and whether it can handle it with the pause option on or off. If anyone has any idea of what the pause option is about and why it changed from 2.6.5 to 2.6.10, I'd like to know! Thanks! - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/