Torsten Kaiser wrote: >>> That missing +1 would explain, why the SGE_TRM never gets set. >> Thanks a lot for tracking this down. Does changing the above code fix >> your problem? > > I did not try it. > I'm not an libata expert and while this change looks suspicios, I > can't be 100% sure if that change was intended. > And I did not want to experiment this deep in the code and risk > corrupting the hole drive.
I don't think you would risk too much by changing that bit of code. Please try it. >>> But I'm still not understanding, how the kernel could only fail >>> sometimes at bootup, but after that working without any visible >>> errors? Is the sil-chip rather intelligent about detecting corrupted >>> sglists and silently ignoring them? >> I have no idea why it fails only sometimes. > > And that is, why I'm so unsure. > The error looks to serious to only cause random failures on one of two > drives on bootup. > I never had trouble with the remaining drive on the SiI-chip or both > drives if one got killed during booting. > > I'm guessing that leaving the computer powered down long enough fills > the RAM with a special pattern that really hangs the drive, while > normaly it would just reject the invalid data. (I have ECC-RAM, does > this matter?) > > Another guess might be that most of the time the Sil-chip correctly > terminates after the transfer-length is reached, even if SGE_TRM is > missing... I have no idea either. We'll probably need a PCI bus tracer to tell exactly what's going on. Thanks. -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/