> Hello Michael, > > Am 24.1.2007 14:36 Uhr, Michael Schuster schrieb: > >>> -------------------------------------------------------------- >>> [EMAIL PROTECTED] # zpool status >>> pool: pool0 >>> state: ONLINE >>> scrub: none requested >>> config: >> >> [...] >> >>> Jan 23 18:51:38 newponit ^Mpanic[cpu2]/thread=30000e81600: >>> Jan 23 18:51:38 newponit unix: [ID 268973 kern.notice] md: Panic due to >>> lack of DiskSuite state >>> Jan 23 18:51:38 newponit database replicas. Fewer than 50% of the total >>> were available, >>> Jan 23 18:51:38 newponit so panic to ensure data integrity. >> >> this message shows (and the rest of the stack prove) that your panic >> happened in SVM. It has NOTHING to do with zfs. So either you pulled the >> wrong disk, or the disk you pulled also contained SVM volumes (next to >> ZFS). > > I noticed that the panic was in SVM and I'm wondering, why the machine > was hanging. SVM is only running on the internal disks (c0) and I pulled > a disk from the D1000:
so the device that was affected had nothing to do with SVM at all. fine ... I have the exact same cconfig here. Internal SVM and then external ZFS on two disk arrays on two controllers. > Jan 23 17:24:14 newponit scsi: [ID 107833 kern.warning] WARNING: > /[EMAIL PROTECTED],4000/[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd50): > Jan 23 17:24:14 newponit SCSI transport failed: reason 'incomplete': > retrying command > Jan 23 17:24:14 newponit scsi: [ID 107833 kern.warning] WARNING: > /[EMAIL PROTECTED],4000/[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd50): > Jan 23 17:24:14 newponit disk not responding to selection > Jan 23 17:24:18 newponit scsi: [ID 107833 kern.warning] WARNING: > /[EMAIL PROTECTED],4000/[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd50): > Jan 23 17:24:18 newponit disk not responding to selection > > This is clearly the disk with ZFS on it: SVM has nothing to do with this > disk. A minute later, the troubles started with the internal disks: OKay .. so are we back to looking at ZFS or ZFS and the SVM components or some interaction between these kernel modules. At this point I have to be careful not to fall into a pit of blind ignorance as I grobe for the answer. Perhaps some data would help. Was there a core file in /var/crash/newponit ? > Jan 23 17:25:26 newponit scsi: [ID 365881 kern.info] /[EMAIL > PROTECTED],4000/[EMAIL PROTECTED] > (glm0): > Jan 23 17:25:26 newponit Cmd (0x60000a3ed10) dump for Target 0 Lun 0: > Jan 23 17:25:26 newponit scsi: [ID 365881 kern.info] /[EMAIL > PROTECTED],4000/[EMAIL PROTECTED] > (glm0): > Jan 23 17:25:26 newponit cdb=[ 0x28 0x0 0x0 0x78 0x6 0x30 0x0 > 0x0 0x10 > 0x0 ] > Jan 23 17:25:26 newponit scsi: [ID 365881 kern.info] /[EMAIL > PROTECTED],4000/[EMAIL PROTECTED] > (glm0): > Jan 23 17:25:26 newponit pkt_flags=0x4000 pkt_statistics=0x60 > pkt_state=0x7 > Jan 23 17:25:26 newponit scsi: [ID 365881 kern.info] /[EMAIL > PROTECTED],4000/[EMAIL PROTECTED] > (glm0): > Jan 23 17:25:26 newponit pkt_scbp=0x0 cmd_flags=0x860 > Jan 23 17:25:26 newponit scsi: [ID 107833 kern.warning] WARNING: > /[EMAIL PROTECTED],4000/[EMAIL PROTECTED] (glm0): > Jan 23 17:25:26 newponit Disconnected tagged cmd(s) (1) timeout for > Target 0.0 so a pile of scsi noise above there .. one would expect that from a suddenly missing scsi device. > Jan 23 17:25:26 newponit genunix: [ID 408822 kern.info] NOTICE: glm0: > fault detected in device; service still available > Jan 23 17:25:26 newponit genunix: [ID 611667 kern.info] NOTICE: glm0: > Disconnected tagged cmd(s) (1) timeout for Target 0.0 NCR scsi controllers .. what OS revision is this ? Solaris 10 u 3 ? Solaris Nevada snv_55b ? > Jan 23 17:25:26 newponit glm: [ID 401478 kern.warning] WARNING: > ID[SUNWpd.glm.cmd_timeout.6018] > Jan 23 17:25:26 newponit scsi: [ID 107833 kern.warning] WARNING: > /[EMAIL PROTECTED],4000/[EMAIL PROTECTED] (glm0): > Jan 23 17:25:26 newponit got SCSI bus reset > Jan 23 17:25:26 newponit genunix: [ID 408822 kern.info] NOTICE: glm0: > fault detected in device; service still available > > SVM and ZFS disks are on a seperate SCSI bus, so theoretically there > should be any impact on the SVM disks when I pull out a ZFS disk. I still feel that you hit a bug in ZFS somewhere. Under no circumstances should a Solaris server panic and crash simply because you pulled out a single disk that was totally mirrored. In fact .. I will reproduce those conditions here and then see what happens for me. Dennis _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss