Ian Lepore wrote: > On Wed, 2013-05-29 at 16:21 +0200, Oliver Fromme wrote: > > Steven Hartland wrote: > > > Have you checked your sata cables and psu outputs? > > > > > > Both of these could be the underlying cause of poor signalling. > > > > I can't easily check that because it is a cheap rented > > server in a remote location. > > > > But I don't believe it is bad cabling or PSU anyway, or > > otherwise the problem would occur intermittently all the > > time if the load on the disks is sufficiently high. > > But it only occurs at tags=3 and above. At tags=2 it does > > not occur at all, no matter how hard I hammer on the disks. > > > > At the moment I'm inclined to believe that it is either > > a bug in the HDD firmware or in the controller. The disks > > aren't exactly new, they're 400 GB Samsung ones that are > > several years old. I think it's not uncommon to have bugs > > in the NCQ implementation in such disks. > > > > The only thing that puzzles me is the fact that the problem > > also disappears completely when I reduce the SATA rev from > > II to I, even at tags=32. > > It seems to me that you dismiss signaling problems too quickly. > Consider the possibilities... A bad cable leads to intermittant errors > at higher speeds. When NCQ is disabled or limited the software handles > these errors pretty much transparently. When NCQ is not limitted and > there are many outstanding requests, suddenly the error handling in the > software breaks down somehow and a minor recoverable problem becomes an > in-your-face error. > > I'm not saying any of the foregoing is true, just that you should > consider the possibility that you're dealing with multiple problems > which are only loosely coupled, but together can seem like a single more > serious problem. You don't know enough yet to casually dismiss > anything.
Well ... I also can't dismiss the possibility that there is a mouse in the machine that is pulling the SATA cables twice every minute. :-) But seriously ... I don't see how bad cabling could cause errors at tags=3 and no errors at all at tags=2. It shouldn't make a difference for the cables if there are two or three tags used. And by the way, it doesn't make a difference at all whether I use tags=3 or tags=32; the rate of errors is the same in both cases (about two per minute during buildword). I have googled a bit; the Samsung HD401LJ and HD403LJ don't seem to be innocent ... There are lots of pages mentioning problems with NCQ and SATA I vs. II. Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing Handelsregister: Amtsgericht Muenchen, HRA 74606, Geschäftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsreg.: Amtsgericht München, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen/-Produkte + mehr: http://www.secnetix.de/bsd "A misleading benchmark test can accomplish in minutes what years of good engineering can never do." -- Dilbert (2009-03-02) _______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"