On Mon, Dec 27, 2021 at 8:46 AM Wols Lists <antli...@youngman.org.uk> wrote: > > On 27/12/2021 13:40, Michael wrote: > > On Monday, 27 December 2021 11:32:39 GMT Wols Lists wrote: > >> On 27/12/2021 11:07, Jacques Montier wrote: > >>> Well, i don't know if my partitions are aligned or mis-aligned... How > >>> could i get it ? > >> > >> fdisk would have spewed a bunch of warnings. So you're okay. > >> > >> I'm not sure of the details, but it's the classic "off by one" problem - > >> if there's a mismatch between the kernel block size and the disk block > >> size any writes required doing a read-update-write cycle which of course > >> knackered performance. I had that hit a while back. > >> > >> But seeing as fdisk isn't moaning, that isn't the problem ... > >> > >> Cheers, > >> Wol > > > > I also thought of misaligned boundaries when I first saw the error, but the > > mention of Seagate by the OP pointed me to another edge case which crept up > > with zstd compression on ZFS. I'm mentioning it here in case it is > > relevant: > > > > https://livelace.ru/posts/2021/Jul/19/unaligned-write-command/ > > > that might be of interest to me ... I'm getting system lockups but it's > not an SSD. I've got two IronWolves and a Barracuda. > > But I notice the OP has a Barra*C*uda. Note the different spelling. > That's a shingled drive I believe, which shouldn't make a lot of > difference in light usage, but you don't want to hammer it!
I've run into this issue and I've seen rare reports of it online, but no sign of resolution. I'm pretty sure it is some sort of bug in the kernel. I've tended to see it under load, and mostly when using zfs. I do not use zstd compression and do not have any zvols on the pools that had this issue. So, either there are multiple problems, or that linked post did not correctly identify the root cause (which seems likely). I'm guessing it is triggered under load and perhaps using zstd compression helps create that load. I haven't seen it much lately - probably because I've shifted a lot of my load to lizardfs and also I'm using USB3 hard drives for the bulk of my storage and since these seem to be ATA errors the removal of the SATA host and associated drivers may bypass the problem. I doubt this has anything to do with physical/logical sector size and partition alignment. The disks should still work correctly if the physical sectors aren't aligned - they should just have performance degradation. In any case, all my drives are aligned on physical sector boundaries. I'm not familiar enough with ATA to understand what the actual errors are referring to. Here is an example of one of the errors I've had in the past from one of these situations. A zpool scrub usually clears up any damage and then the drive works normally until the issue happens again (which hasn't happened in quite a while for me now). I have a dump of the SMART logs and the kernel ring buffer: ATA Error Count: 1 CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 1 occurred at disk power-on lifetime: 12838 hours (534 days + 22 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 e0 88 cc c3 06 Error: ICRC, ABRT at LBA = 0x06c3cc88 = 113495176 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 61 00 c0 68 cb c3 40 08 2d+00:45:18.962 WRITE FPDMA QUEUED 60 00 b8 98 67 00 40 08 2d+00:45:18.917 READ FPDMA QUEUED 60 00 b0 98 65 00 40 08 2d+00:45:18.916 READ FPDMA QUEUED 60 00 a8 98 66 00 40 08 2d+00:45:18.916 READ FPDMA QUEUED 61 00 a0 68 ca c3 40 08 2d+00:45:18.879 WRITE FPDMA QUEUED [354064.268896] ata6.00: exception Emask 0x11 SAct 0x1000000 SErr 0x480000 action 0x6 frozen [354064.268907] ata6.00: irq_stat 0x48000008, interface fatal error [354064.268910] ata6: SError: { 10B8B Handshk } [354064.268915] ata6.00: failed command: WRITE FPDMA QUEUED [354064.268919] ata6.00: cmd 61/00:c0:68:cb:c3/07:00:06:01:00/40 tag 24 ncq dma 917504 out res 50/00:00:68:cb:c3/00:07:06:01:00/40 Emask 0x10 (ATA bus error) [354064.268922] ata6.00: status: { DRDY } [354064.268926] ata6: hard resetting link [354064.731093] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [354064.734739] ata6.00: configured for UDMA/133 [354064.734759] sd 5:0:0:0: [sdc] tag#24 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [354064.734764] sd 5:0:0:0: [sdc] tag#24 Sense Key : Illegal Request [current] [354064.734767] sd 5:0:0:0: [sdc] tag#24 Add. Sense: Unaligned write command [354064.734771] sd 5:0:0:0: [sdc] tag#24 CDB: Write(16) 8a 00 00 00 00 01 06 c3 cb 68 00 00 07 00 00 00 [354064.734774] print_req_error: I/O error, dev sdc, sector 4408462184 [354064.734791] ata6: EH complete -- Rich