On Wed, Jan 17, 2007 at 04:35:02PM -0600, Miller, Mike (OS Dev) wrote: > > On Wed, Jan 17, 2007 at 10:43:14AM -0600, Miller, Mike (OS Dev) wrote: > > > Hello, > > > We've been seeing some nasty data corruption issues on some > > platforms. > > > We've been capturing PCI-E traces looking for something > > nasty but we > > > haven't found anything yet. One of the hardware guys if asking if > > > there is a call in Linux to issue a PME_Turn_Off broadcast message. > > > > > > PME_Turn_Off Broadcast Message > > > Before main component power and reference clocks are turned > > off, the > > > Root Complex or Switch Downstream Port must issue a > > broadcast Message > > > that instructs all agents downstream of that point within the > > > hierarchy to cease initiation of any subsequent PM_PME Messages, > > > effective immediately upon receipt of the PME_Turn_Off Message. > > > > > > This must be initiated from the root complex. Is there such > > a call in > > > linux? > > > > This firmware that implements the PCI-E connection should do > > this, I don't think there is anything that the Operating > > system can do to control this, as PCI-E should be transparant > > to the OS. > > Hmmm, the hw folks tell me that "other" os'es implement that. But I > would tend to agree that system firmware should probably be doing this.
Where would the "other" oses implement this, as they don't even know the pci device is a pci-e port? How can the os send a PCI-E message without talking directly to the chipset-specific controller chip? > > > > Unless this is on a PCI-E Hotplug system? What is the > > No hotplug. That's good :) > > sequence of events that cause the data corruption? > > Install rhel4 u4 on ia64, at the reboot prompt let the system sit idle > for several hours or overnight. Then after rebooting the filesystems are > totally trashed. I usually get a message that the kernel is not a valid > compressed file format. If I try to rescue the system I cannot mount any > filesystems. I don't have the message handy but it complains about an > invalid Verneed record, whatever that is. The RHEL4 kernel is pretty old as far as PCI-E goes. Can you try this on a kernel.org release? 2.6.19.2 would be great at the least. If not, you're going to have to get your support from Red Hat on this issue :( Any kernel log messages while the machine is idle before rebooting? What tasks are running overnight that would cause writes to the disk? thanks, greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/