Hi all;

We have a Solaris 10 U9 x86 instance running on Silicon Mechanics /
SuperMicro hardware.

Occasionally under high load (ZFS scrub for example), the box becomes
non-responsive (it continues to respond to ping but nothing else works
-- not even the local console).  Our only solution is to hard reset
after which everything comes up normally.

Logs are showing the following:

  Jan  8 09:44:08 prodsys-dmz-zfs2 scsi: [ID 107833 kern.warning] WARNING: 
/pci@0,0/pci8086,3410@9/pci1000,72@0 (mpt_sas0):
  Jan  8 09:44:08 prodsys-dmz-zfs2        MPT SGL mem alloc failed
  Jan  8 09:44:08 prodsys-dmz-zfs2 scsi: [ID 107833 kern.warning] WARNING: 
/pci@0,0/pci8086,3410@9/pci1000,72@0 (mpt_sas0):
  Jan  8 09:44:08 prodsys-dmz-zfs2        Unable to allocate dma memory for 
extra SGL.
  Jan  8 09:44:08 prodsys-dmz-zfs2 scsi: [ID 107833 kern.warning] WARNING: 
/pci@0,0/pci8086,3410@9/pci1000,72@0 (mpt_sas0):
  Jan  8 09:44:08 prodsys-dmz-zfs2        Unable to allocate dma memory for 
extra SGL.
  Jan  8 09:44:10 prodsys-dmz-zfs2 scsi: [ID 107833 kern.warning] WARNING: 
/pci@0,0/pci8086,3410@9/pci1000,72@0 (mpt_sas0):
  Jan  8 09:44:10 prodsys-dmz-zfs2        Unable to allocate dma memory for 
extra SGL.
  Jan  8 09:44:10 prodsys-dmz-zfs2 scsi: [ID 107833 kern.warning] WARNING: 
/pci@0,0/pci8086,3410@9/pci1000,72@0 (mpt_sas0):
  Jan  8 09:44:10 prodsys-dmz-zfs2        MPT SGL mem alloc failed
  Jan  8 09:44:11 prodsys-dmz-zfs2 rpcmod: [ID 851375 kern.warning] WARNING: 
svc_cots_kdup no slots free

I am able to resolve the last error by adjusting upwards the duplicate
request cache sizes, but have been unable to find anything on the MPT
SGL errors.

Anyone have any thoughts on what this error might be?

At this point, we are simply going to apply patches to this box (we do
see an outstanding mpt patch):

147150 -- < 01 R-- 124 SunOS 5.10_x86: mpt_sas patch
147702 -- < 03 R--  21 SunOS 5.10_x86: mpt patch

But we have another identically configured box at the same patch level
(admittedly with slightly less workload, though it also undergoes
monthly zfs scrubs) which does not experience this issue.

Ray
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to