Ray Van Dolson wrote:
I posted a thread on this once long ago[1] -- but we're still fighting
with this problem and I wanted to throw it out here again.

All of our hardware is from Silicon Mechanics (SuperMicro chassis and
motherboards).

Up until now, all of the hardware has had a single 24-disk expander /
backplane -- but we recently got one of the new SC847-based models with
24 disks up front and 12 in the back -- a dual backplane setup.

We're using two SSD's in the front backplane as mirrored ZIL/OS (I
don't think we have the 4K alignment set up correctly) and two drives
in the back as L2ARC.

The rest of the disks are 1TB SATA disks which make up a single large
zpool via three 8-disk RAIDZ2's.  As you can see, we don't have the
server maxed out on drives...

In any case, this new server gets between 400 and 600 of these timeout
errors an hour:

Aug 21 03:10:17 dev-zfs1 scsi: [ID 365881 kern.info] 
/p...@0,0/pci8086,3...@8/pci15d9,1...@0 (mpt0):
Aug 21 03:10:17 dev-zfs1        Log info 31126000 received for target 8.
Aug 21 03:10:17 dev-zfs1        scsi_status=0, ioc_status=804b, scsi_state=c
Aug 21 03:10:17 dev-zfs1 scsi: [ID 365881 kern.info] 
/p...@0,0/pci8086,3...@8/pci15d9,1...@0 (mpt0):
Aug 21 03:10:17 dev-zfs1        Log info 31126000 received for target 8.
Aug 21 03:10:17 dev-zfs1        scsi_status=0, ioc_status=804b, scsi_state=c
Aug 21 03:10:17 dev-zfs1 scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci8086,3...@8/pci15d9,1...@0/s...@8,0 (sd0):
Aug 21 03:10:17 dev-zfs1        Error for Command: write(10)               
Error Level: Retryable
Aug 21 03:10:17 dev-zfs1 scsi: [ID 107833 kern.notice]  Requested Block: 
21230708                  Error Block: 21230708
Aug 21 03:10:17 dev-zfs1 scsi: [ID 107833 kern.notice]  Vendor: ATA             
                   Serial Number: CVEM002600EW
Aug 21 03:10:17 dev-zfs1 scsi: [ID 107833 kern.notice]  Sense Key: Unit 
Attention
Aug 21 03:10:17 dev-zfs1 scsi: [ID 107833 kern.notice]  ASC: 0x29 (power on, 
reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0
Aug 21 03:10:21 dev-zfs1 scsi: [ID 365881 kern.info] 
/p...@0,0/pci8086,3...@8/pci15d9,1...@0 (mpt0):

iostat -xnMCez shows that the first of the two ZIL drives receives
about twice the number of "errors" as the second drive.

There are no other errors on any other drives -- including the L2ARC
SSD's and the ascv_t times seem reasonably low and don't indicate a bad
drive to my eyes...

The timeouts above exact a rather large performance penalty on the
system, both in IO and general usage from an SSH console.  Obvious
pauses and glitches when accessing the filesystem.

This isn't a timeout. "Unit Attention" is the drive saying back to the computer that it's been reset and has forgotten any negotiation which happened with the controller. It's a couple of decades since I was working on SCSI at this level, but IIRC, a drive will return "Unit Attention" error to the first command issued to it after a reset/powerup, except for a Test Unit Ready command. As it says, this might be caused by power on, reset, or bus reset occurred.

The problem _follows_ the ZIL and isn't tied to hardware.  IOW, if I
switch to using the L2ARC drives as ZIL, those drives suddenly exhibit
the timeout problems...

A possibility is that the problem is related to the nature of the load a ZIL drive attracts. One scenario could be that you are crashing the drive firmware, causing it it reset and reinitialize itself, and therefore to return "Unit Attention" to the next command. (I don't know if X25-E's can behave this way.)

I would try and correct the 4k alignment on the ZIL at least - that does significantly affect the work the drive has to do internally (as well as its performance), although I've no idea if that's related to the issue you're seeing.

If we connect the SSD drives directly to the LSI controller instead of
hanging off the hot-swap backplane, the timeouts go away.

Again, may be related to some combination of the load type and physical characteristics.

If we use SSD's attached to the SATA controllers as ZIL, there are also
no performance issues or timeout errors.

Why not do this then? It also avoids using SATA tunneling protocol across the SAS and port expanders.

So the problem only occurs with SSD drives acting as ZIL attached to
the backplane.

This is leading me to believe we have a driver issue of some sort in
the mpt subsystem unable to cope with the longer command path of
multiple backplanes.  Someone alluded to this in [1] as well, and it
makes sense to me.

One quick fix to me would seem to be upping the SCSI timeout values.

The error you included isn't a timeout.

The SSD's themselves are all Intel X-25E's (32GB) with firmware 8860
and the LSI 1068 is a SAS1068E B3 with firmware 011c0200 (1.28.02.00).

I'm not intimately familiar with the firmware versions, but if you're having problems, making sure you have latest firmware is probably a good thing to do.


--
Andrew Gabriel
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to