On Monday, February 29, 2016 07:29:03 PM Harry Schmalzbauer wrote:
>  Bezüglich Harry Schmalzbauer's Nachricht vom 28.02.2016 20:55 (localtime):
> >  Hello,
> >
> > I have a remote machine with a probably defective ODD, but until r294989
> > (from Jan 28th) I could boot with just these warnings:
> > (cd1:ahcich1:0:0:0): READ(10). CDB: 28 00 00 38 85 e0 00 00 01 00
> > (cd1:ahcich1:0:0:0): CAM status: SCSI Status Error
> > (cd1:ahcich1:0:0:0): SCSI status: Check Condition
> > (cd1:ahcich1:0:0:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read
> > error)
> > (cd1:ahcich1:0:0:0): Error 5, Unretryable error
> > (cd1:ahcich1:0:0:0): cddone: got error 0x5 back
> > …
> >
> > beta3 doesn't boot anymore, it's hanging with ahci-timeouts:
> > ahcich2: Timeout on slot 11 port 0
> > ahcich2: is 00000008 cs 00000000 ss 00000000 rs 00000800 tfd 40 derr
> > 00000000 cmd 0004cb17
> > (ada1:ahcich2:0:0:0): READ_FPDMA_QUEUED. ACB: 60 01 ae a3 50 40 5d 01 00
> > 00 00 00
> > ...
> > (aprobe0:ahcich2:0:0:0) ATA_IDENTIFY. ACB eec 00 00 00 00 40 00 00 00 00
> > 00 00
> > (aprobe0:ahcich2:0:0:0) CAM status: Command timeout
> > (aprobe0:ahcich2:0:0:0) Error 5, Retry was blocked
> > ada1 detached
> > ...
> > The numbers (first ACB) and also the channel varies from time to time
> 
> I could narrow it down to r295480
> (https://svnweb.freebsd.org/base?view=revision&revision=295480)
> 
> Reverting that lets the machine boot again.
> 
> I captured verbose boot messages, finding out that problem relaxes with
> verbose-booting, since ahci seems to recover:
> …
> TSC timecounter discards lower 1 bit(s)
> Timecounter "TSC-low" frequency 1746033500 Hz quality -100
> ahcich2: Timeout on slot 12 port 0
> ahcich2: is 00000008 cs 00000000 ss 00000000 rs 00001000 tfd 40 serr
> 00000000 cmd 0004cc17
> ahcich2: AHCI reset...
> (ada1:ahcich2:0:0:0): READ_FPDMA_QUEUED. ACB: 60 04 71 a3 50 40 5d 01 00
> 00 00 00
> (ada1:ahcich2:0:0:0): CAM status: Command timeout
> (ada1:ahcich2:0:0:0): Retrying command
> ahcich2: SATA connect time=100us status=00000123
> ahcich2: AHCI reset: device found
> ahcich2: AHCI reset: device ready after 100ms
> ahcich1: SNTF 0x0001
> ahcich1: SNTF 0x0001
> …
> 
> I have checked twice that r295480 introduces boot failure here.
> 
> I have absolutely no idea where/how/why/what race happens...
> 
> Thanks for any hints,

That is most bizarre.  Does HEAD boot fine on this machine?  The change
in question probably alters the timing of startup a bit since the random
kthread is placed on the run queue later which might affect the relative
order of kthreads as they start executing, but that would just mean it is
exposting a race in some other part of the system.

-- 
John Baldwin
_______________________________________________
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Reply via email to