> On 5. Aug 2021, at 11:11, Michelle <[email protected]> wrote: > > I removed the drive in order to a backup before I start messing around > with things, which is why it isn't in the iostat. The backup will take > probably until early evening. > > This is what happened from messages around that time. Almost looks like > whatever happened, it rebooted. >
From those, I’d say, you need to replace that disk. rgds, toomas > > Aug 5 01:55:01 jaguar smbd[601]: [ID 617204 daemon.error] Can't get > SID for ID=0 type=1, status=-9977 > Aug 5 01:58:00 jaguar ahci: [ID 296163 kern.warning] WARNING: ahci0: > ahci port 3 has task file error > Aug 5 01:58:00 jaguar ahci: [ID 687168 kern.warning] WARNING: ahci0: > ahci port 3 is trying to do error recovery > Aug 5 01:58:00 jaguar ahci: [ID 693748 kern.warning] WARNING: ahci0: > ahci port 3 task_file_status = 0x4041 > Aug 5 01:58:00 jaguar ahci: [ID 657156 kern.warning] WARNING: ahci0: > error recovery for port 3 succeed > Aug 5 01:58:09 jaguar ahci: [ID 296163 kern.warning] WARNING: ahci0: > ahci port 3 has task file error > Aug 5 01:58:09 jaguar ahci: [ID 687168 kern.warning] WARNING: ahci0: > ahci port 3 is trying to do error recovery > Aug 5 01:58:09 jaguar ahci: [ID 693748 kern.warning] WARNING: ahci0: > ahci port 3 task_file_status = 0x4041 > Aug 5 01:58:09 jaguar ahci: [ID 657156 kern.warning] WARNING: ahci0: > error recovery for port 3 succeed > Aug 5 02:00:15 jaguar ahci: [ID 296163 kern.warning] WARNING: ahci0: > ahci port 3 has task file error > Aug 5 02:00:15 jaguar ahci: [ID 687168 kern.warning] WARNING: ahci0: > ahci port 3 is trying to do error recovery > Aug 5 02:00:15 jaguar ahci: [ID 693748 kern.warning] WARNING: ahci0: > ahci port 3 task_file_status = 0x4041 > Aug 5 02:00:16 jaguar ahci: [ID 657156 kern.warning] WARNING: ahci0: > error recovery for port 3 succeed > Aug 5 02:00:20 jaguar ahci: [ID 296163 kern.warning] WARNING: ahci0: > ahci port 3 has task file error > Aug 5 02:00:20 jaguar ahci: [ID 687168 kern.warning] WARNING: ahci0: > ahci port 3 is trying to do error recovery > Aug 5 02:00:20 jaguar ahci: [ID 693748 kern.warning] WARNING: ahci0: > ahci port 3 task_file_status = 0x4041 > Aug 5 02:00:20 jaguar ahci: [ID 657156 kern.warning] WARNING: ahci0: > error recovery for port 3 succeed > Aug 5 02:00:24 jaguar ahci: [ID 296163 kern.warning] WARNING: ahci0: > ahci port 3 has task file error > Aug 5 02:00:24 jaguar ahci: [ID 687168 kern.warning] WARNING: ahci0: > ahci port 3 is trying to do error recovery > Aug 5 02:00:24 jaguar ahci: [ID 693748 kern.warning] WARNING: ahci0: > ahci port 3 task_file_status = 0x4041 > Aug 5 02:00:24 jaguar ahci: [ID 657156 kern.warning] WARNING: ahci0: > error recovery for port 3 succeed > Aug 5 02:00:24 jaguar ahci: [ID 811322 kern.info] NOTICE: ahci0: > ahci_tran_reset_dport port 3 reset device > Aug 5 02:00:29 jaguar ahci: [ID 296163 kern.warning] WARNING: ahci0: > ahci port 3 has task file error > Aug 5 02:00:29 jaguar ahci: [ID 687168 kern.warning] WARNING: ahci0: > ahci port 3 is trying to do error recovery > Aug 5 02:00:29 jaguar ahci: [ID 693748 kern.warning] WARNING: ahci0: > ahci port 3 task_file_status = 0x4041 > Aug 5 02:00:29 jaguar ahci: [ID 657156 kern.warning] WARNING: ahci0: > error recovery for port 3 succeed > Aug 5 02:00:34 jaguar ahci: [ID 296163 kern.warning] WARNING: ahci0: > ahci port 3 has task file error > Aug 5 02:00:34 jaguar ahci: [ID 687168 kern.warning] WARNING: ahci0: > ahci port 3 is trying to do error recovery > Aug 5 02:00:34 jaguar ahci: [ID 693748 kern.warning] WARNING: ahci0: > ahci port 3 task_file_status = 0x4041 > Aug 5 02:00:34 jaguar ahci: [ID 657156 kern.warning] WARNING: ahci0: > error recovery for port 3 succeed > Aug 5 02:00:38 jaguar ahci: [ID 296163 kern.warning] WARNING: ahci0: > ahci port 3 has task file error > Aug 5 02:00:38 jaguar ahci: [ID 687168 kern.warning] WARNING: ahci0: > ahci port 3 is trying to do error recovery > Aug 5 02:00:38 jaguar ahci: [ID 693748 kern.warning] WARNING: ahci0: > ahci port 3 task_file_status = 0x4041 > Aug 5 02:00:38 jaguar ahci: [ID 657156 kern.warning] WARNING: ahci0: > error recovery for port 3 succeed > Aug 5 02:00:53 jaguar fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS- > 8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major > Aug 5 02:00:53 jaguar EVENT-TIME: Thu Aug 5 02:00:53 UTC 2021 > Aug 5 02:00:53 jaguar PLATFORM: ProLiant-MicroServer, CSN: 5C7351P4L9, > HOSTNAME: jaguar > Aug 5 02:00:53 jaguar SOURCE: zfs-diagnosis, REV: 1.0 > > > On Thu, 2021-08-05 at 11:03 +0300, Toomas Soome via openindiana-discuss > wrote: >>> On 5. Aug 2021, at 10:52, Michelle <[email protected]> wrote: >>> >>> Thanks for this. So I'm possibly better off rolling back the OS >>> snapshot after my backup has finished? >> >> maybe, maybe not. first of all, I have no idea to what point the >> rollback would be. >> >> secondly; the system has seen some errors, at this time, the fault >> is, it does not tell us if those were checksum errors or something >> else, and it seems to me, it is something else. >> >> and this is why: if you look on your zpool output, you see report >> about c6t3d0, but iostat -En below, it does not include c6t3d0. It >> seems to be missing. >> >> what do you get from: 'iostat -En c6t3d0’ ? >> >> Also, it would be good idea to check /var/adm/messages, are there any >> SATA or IO related messages around august 05. 02:00? >> >> FMA definitely has recorded an issue about pool, so there must be >> something going on. >> >> rgds, >> toomas >> >>> I have removed the drive for the moment, and am running a backup. >>> Just >>> in case :-) >>> >>> mich@jaguar:~$ iostat -En >>> c5d1 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 >>> Model: INTEL SSDSA2M04 Revision: Serial No: CVGB949301PC040 >>> Size: 40.02GB <40019116032 bytes> >>> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 >>> Illegal Request: 0 >>> c6t1d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 >>> Vendor: ATA Product: WDC WD40EZRZ-00G Revision: 0A80 Serial >>> No: >>> WD-WCC7K5UK24LJ >>> Size: 4000.79GB <4000787030016 bytes> >>> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 >>> Illegal Request: 0 Predictive Failure Analysis: 0 >>> c6t0d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 >>> Vendor: ATA Product: WDC WD60EFRX-68L Revision: 0A82 Serial >>> No: >>> WD-WX21DA84EH0F >>> Size: 6001.18GB <6001175126016 bytes> >>> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 >>> Illegal Request: 0 Predictive Failure Analysis: 0 >>> c6t2d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 >>> Vendor: ATA Product: WDC WD60EFRX-68L Revision: 0A82 Serial >>> No: >>> WD-WX51DB880RJ4 >>> Size: 6001.18GB <6001175126016 bytes> >>> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 >>> Illegal Request: 0 Predictive Failure Analysis: 0 >>> >>> >>> --------------- ------------------------------------ ------------- >>> - -- >>> ------- >>> TIME EVENT-ID MSG- >>> ID SEVERITY >>> --------------- ------------------------------------ ------------- >>> - -- >>> ------- >>> Aug 05 02:00:53 c5934fd6-5f4b-409e-b0f8-8f44ea8f99c4 ZFS-8000- >>> FD Major >>> >>> Host : jaguar >>> Platform : ProLiant-MicroServer Chassis_id : 5C7351P4L9 >>> Product_sn : >>> >>> Fault class : fault.fs.zfs.vdev.io >>> Affects : zfs://pool=jaguar/vdev=740c01ae0d3c3109 >>> faulted and taken out of service >>> Problem in : zfs://pool=jaguar/vdev=740c01ae0d3c3109 >>> faulted and taken out of service >>> >>> Description : The number of I/O errors associated with a ZFS device >>> exceeded >>> acceptable levels. Refer to >>> http://illumos.org/msg/ZFS-8000-FD for more >>> information. >>> >>> Response : The device has been offlined and marked as >>> faulted. An >>> attempt >>> will be made to activate a hot spare if >>> available. >>> >>> Impact : Fault tolerance of the pool may be compromised. >>> >>> Action : Run 'zpool status -x' and replace the bad device. >>> >>> >>> >>> On Thu, 2021-08-05 at 10:22 +0300, Toomas Soome via openindiana- >>> discuss >>> wrote: >>>>> On 5. Aug 2021, at 09:35, Michelle <[email protected]> >>>>> wrote: >>>>> >>>>> Hi Folks, >>>>> >>>>> About a month ago I updated my Hipster... >>>>> SunOS jaguar 5.11 illumos-ca706442e6 i86pc i386 i86pc >>>>> >>>>> This morning it was absolutely crawling. Couldn't even connect >>>>> via >>>>> SSH >>>>> and had to bounce the box. >>>>> >>>>> It was reporting a drive as faulted, but didn't give any >>>>> numbers... >>>>> everything was 0. I'm now not sure what happened and whether >>>>> the >>>>> drive >>>>> is good, or whether I should roll back the OS. >>>>> >>>>> (and the drive WD Red 6TB (not shingle) went out of warrantee a >>>>> week >>>>> ago. How about that, eh?) >>>>> >>>>> Grateful for any opinions please. >>>>> >>>>> Thu 5 Aug 04:00:01 UTC 2021 >>>>> NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DED >>>>> UP >>>>> HE >>>>> ALTH ALTROOT >>>>> lion 5.45T 5.28T 176G - - 4% 96% 1.0 >>>>> 0x >>>>> DEGR >>>>> ADED - >>>>> pool: jaguar >>>>> state: DEGRADED >>>>> status: One or more devices are faulted in response to >>>>> persistent >>>>> errors. >>>>> Sufficient replicas exist for the pool to continue >>>>> functioning >>>>> in a >>>>> degraded state. >>>>> action: Replace the faulted device, or use 'zpool clear' to >>>>> mark >>>>> the >>>>> device >>>>> repaired. >>>>> scan: scrub in progress since Thu Aug 5 00:00:00 2021 >>>>> 6.00T scanned at 428M/s, 5.02T issued at 358M/s, 7.90T >>>>> total >>>>> 1M repaired, 63.59% done, 0 days 02:20:17 to go >>>>> config: >>>>> NAME STATE READ WRITE CKSUM >>>>> jaguar DEGRADED 0 0 0 >>>>> raidz1-0 DEGRADED 0 0 0 >>>>> c6t0d0 ONLINE 0 0 0 >>>>> c6t2d0 ONLINE 0 0 0 >>>>> c6t3d0 FAULTED 0 0 0 too many >>>>> errors (repairing) >>>>> >>>> >>>> Can you postoutput from: >>>> iostat -En >>>> fmadm faulty >>>> >>>> in any case, there definitely is bug about error reporting - >>>> counters >>>> are zero while “too many errors” is reported. >>>> >>>> rgds, >>>> toomas >>>> _______________________________________________ >>>> openindiana-discuss mailing list >>>> [email protected] >>>> https://openindiana.org/mailman/listinfo/openindiana-discuss >>> >>> _______________________________________________ >>> openindiana-discuss mailing list >>> [email protected] >>> https://openindiana.org/mailman/listinfo/openindiana-discuss >> >> _______________________________________________ >> openindiana-discuss mailing list >> [email protected] >> https://openindiana.org/mailman/listinfo/openindiana-discuss > > > _______________________________________________ > openindiana-discuss mailing list > [email protected] > https://openindiana.org/mailman/listinfo/openindiana-discuss _______________________________________________ openindiana-discuss mailing list [email protected] https://openindiana.org/mailman/listinfo/openindiana-discuss
