SunOS x4500-02.unix 5.10 Generic_127128-11 i86pc i386 i86pc
Admittedly we are not having much luck with the x4500s. This time it was the new x4500, running Solaris 10 5/08. Drive "/[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd30):" stopped responding, and even after a hard reset, it would simply repeat "retryable", "reset", and "fatal" messages forever. So unable to login on console. Again we ended up with the problem of knowing which HDD that actually is broken. Turns out to be drive #40. (Has anyone got a map we can print? Since we couldn't boot it, any Unix commands needed to map are a bit useless, nor do we have a "hd" utility). That a HDD died in the first month of operation is understandable, but does it really have to take the whole server with it? Not to mention stop it from booting. Eventually the NOC staff guessed the correct drive from the blinking of LEDs (no LED was RED), and we were able to boot. Log outputs: Aug 11 08:47:59 x4500-02.unix marvell88sx: [ID 670675 kern.info] NOTICE: marvell88sx5: device on port 3 reset: device disconnected or device error Aug 11 08:47:59 x4500-02.unix sata: [ID 801593 kern.notice] NOTICE: /[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]: Aug 11 08:47:59 x4500-02.unix port 3: device reset Aug 11 08:47:59 x4500-02.unix sata: [ID 801593 kern.notice] NOTICE: /[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]: Aug 11 08:47:59 x4500-02.unix port 3: link lost Aug 11 08:47:59 x4500-02.unix sata: [ID 801593 kern.notice] NOTICE: /[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]: Aug 11 08:47:59 x4500-02.unix port 3: link established Aug 11 08:47:59 x4500-02.unix marvell88sx: [ID 812950 kern.warning] WARNING: marvell88sx5: error on port 3: Aug 11 08:47:59 x4500-02.unix marvell88sx: [ID 517869 kern.info] device error Aug 11 08:47:59 x4500-02.unix marvell88sx: [ID 517869 kern.info] device disconnected Aug 11 08:47:59 x4500-02.unix marvell88sx: [ID 517869 kern.info] device connected Aug 11 08:47:59 x4500-02.unix marvell88sx: [ID 517869 kern.info] EDMA self disabled Aug 11 08:47:59 x4500-02.unix scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd30): Aug 11 08:47:59 x4500-02.unix Error for Command: read Error Level: Retryable Aug 11 08:47:59 x4500-02.unix scsi: [ID 107833 kern.notice] Requested Block: 439202 Error Block: 439202 Aug 11 08:47:59 x4500-02.unix scsi: [ID 107833 kern.notice] Vendor: ATA Serial Number: Aug 11 08:47:59 x4500-02.unix scsi: [ID 107833 kern.notice] Sense Key: No Additional Sense Aug 11 08:47:59 x4500-02.unix scsi: [ID 107833 kern.notice] ASC: 0x0 (no additional sense info), ASCQ: 0x0, FRU: 0x0 scrub: resilver in progress, 10.27% done, 2h14m to go Perhaps not related, but equally annoying: # fmdump TIME UUID SUNW-MSG-ID Aug 11 08:16:32.3925 64da6f29-4dda-44aa-e9ca-ad7054aaeaa1 ZFS-8000-D3 Aug 11 09:08:18.7834 086e6170-e4c7-c66b-c908-e37840db7e96 ZFS-8000-D3 # fmdump -v -u 086e6170-e4c7-c66b-c908-e37840db7e96 TIME UUID SUNW-MSG-ID Aug 11 09:08:18.7834 086e6170-e4c7-c66b-c908-e37840db7e96 ZFS-8000-D3 ^C^Z^\ Alas, "kill -9" does not kill fmdump either, and it appears to lock the server (as well). I will remove the command for now, as it definitely hangs the server every time. Hard reset done again. Lund -- Jorgen Lundman | <[EMAIL PROTECTED]> Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home) _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss