You are both right. More below... On Sep 10, 2010, at 2:06 PM, Piotr Jasiukajtis wrote:
> I don't have any errors from fmdump or syslog. > The machine is SUN FIRE X4275 I don't use mpt or lsi drivers. > It could be a bug in a driver since I see this on 2 the same machines. > > On Fri, Sep 10, 2010 at 9:51 PM, Carson Gaspar <car...@taltos.org> wrote: >> On 9/10/10 4:16 PM, Piotr Jasiukajtis wrote: >>> >>> Ok, now I know it's not related to the I/O performance, but to the ZFS >>> itself. >>> >>> At some time all 3 pools were locked in that way: >>> >>> extended device statistics ---- errors >>> --- >>> r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn >>> tot device >>> 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 1 0 1 >>> c8t0d0 >>> 0.0 0.0 0.0 0.0 0.0 8.0 0.0 0.0 0 100 0 0 0 0 >>> c7t0d0 >> >> Nope, most likely your disks or disk controller/driver. Note that you have 8 >> outstanding I/O requests that aren't being serviced. Look in your syslog, >> and I bet you'll see I/O timeout errors. I have seen this before with >> Western Digital disks attached to an LSI controller using the mpt driver. >> There was a lot of work diagnosing it, see the list archives - an >> /etc/system change fixed it for me (set xpv_psm:xen_support_msi = -1), but I >> was using a xen kernel. Note that replacing my disks with larger Seagate >> ones made the problem go away as well. In this case, the diagnosis that I/Os are stuck at the drive, not being serviced is correct. This is clearly visible as actv>0, asvc_t==0, and the derived %b == 100% However, the error reports are also 0 for the affected devices: s/w, h/w, and trn. In many cases where we see I/O timeouts and devices aborting commands, we will see these logged as transport (trn) errors. For iostat, these errors are reported as since-boot, not per-sample period, so we know that whatever is getting stuck isn't getting unstuck. The symptom we see with questionable devices in the HBA-to-disk path is hundreds, thousands, or millions of transport errors reported. Next question: what does the software stack look like? I knew the sd driver intimately at one time (pictures were in the Enquirer :-) and it will retry and send resets that will ultimately get logged. In this case, we know that at least one hard error was returned for c8t0d0, so there is a ereport somewhere with the details, try "fmdump -eV" This is not a ZFS bug and cannot be fixed at the ZFS layer. -- richard -- OpenStorage Summit, October 25-27, Palo Alto, CA http://nexenta-summit2010.eventbrite.com Richard Elling rich...@nexenta.com +1-760-896-4422 Enterprise class storage for everyone www.nexenta.com _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss