Hi! TLDR; My /home on dmcrypt -> software Raid5 blocks irregular usually without any error messages.
I can get it going again with "fdisk -l /dev/sdx". Do you have an ideas how I can debug this issue further? Is it a dmcrypt, a dm-softraid or a hardware issue? --------------------------------------------------------------- Long version: My /home "partition" is a dmcrypt on software RAID5 with 5 SATA disks. See System info further down in this mail. Once in a while user programs freeze because the dmcrypt or something else further down the chain blocks during a write? on /home. Am I lycky and had a running root shell open I can run a $ fdisk -l /dev/sdx to one of the harddisks in the RAID and the block disappears instantly. I checked if it could be a spindown power management problem but all disks which have a PM feature have it disabled. So I don't think this is the problem. Last night I got a "blocked for more than 300 seconds." message in syslog - see <https://paste.debian.net/1060134/ <https://paste.debian.net/1060134/>> (link valid for 90 days). Log summary: Jan 13 02:34:44 osprey kernel: [969696.242745] INFO: task md127_raid5:238 blocked for more than 300 seconds. Jan 13 02:34:44 osprey kernel: [969696.242772] Call Trace: Jan 13 02:34:44 osprey kernel: [969696.242789] ? __schedule+0x2a2/0x870 Jan 13 02:34:44 osprey kernel: [969696.242995] INFO: task dmcrypt_write:904 blocked for more than 300 seconds. Jan 13 02:34:44 osprey kernel: [969696.243223] INFO: task jbd2/dm-2-8:917 blocked for more than 300 seconds. Jan 13 02:34:44 osprey kernel: [969696.243525] INFO: task mpc:6622 blocked for more than 300 seconds. Jan 13 02:34:44 osprey kernel: [969696.243997] INFO: task kworker/u8:0:6625 blocked for more than 300 seconds. In this case I did a $ fdisk -l /dev/sdf and everything worked again. As I understand the log mpc (user program) started and maybe accessed the config file on /home. The ext4 tried to save the new access time which got down the chain jbd2 -> dmcrypt and blocked in the end in md127_raid5. So it is most likely that I have a problem with the software raid or the harddisks, isn't it? SMART is activated on all disks and does not show any error. How can I debug this further to solve the problem? Thanks in advance for your suggestions. Tom --------------------------------------------------------------- System info: ============ Debian testing $ uname -a Linux osprey 4.19.0-1-amd64 #1 SMP Debian 4.19.12-1 (2018-12-22) x86_64 GNU/Linux $ lsblk -i NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 74.5G 0 disk |-sda1 8:1 0 4G 0 part | `-cswap1 253:1 0 4G 0 crypt [SWAP] `-sda2 8:2 0 70.5G 0 part `-osprey_root 253:0 0 70.5G 0 crypt / sdb 8:16 0 2.7T 0 disk `-sdb1 8:17 0 2.7T 0 part `-md127 9:127 0 10.9T 0 raid5 `-osprey_home 253:2 0 10.9T 0 crypt /home sdc 8:32 0 2.7T 0 disk `-sdc1 8:33 0 2.7T 0 part `-md127 9:127 0 10.9T 0 raid5 `-osprey_home 253:2 0 10.9T 0 crypt /home sdd 8:48 0 2.7T 0 disk `-sdd1 8:49 0 2.7T 0 part `-md127 9:127 0 10.9T 0 raid5 `-osprey_home 253:2 0 10.9T 0 crypt /home sde 8:64 0 2.7T 0 disk `-sde1 8:65 0 2.7T 0 part `-md127 9:127 0 10.9T 0 raid5 `-osprey_home 253:2 0 10.9T 0 crypt /home sdf 8:80 0 2.7T 0 disk `-sdf1 8:81 0 2.7T 0 part `-md127 9:127 0 10.9T 0 raid5 `-osprey_home 253:2 0 10.9T 0 crypt /home $ sdparm --get STANDBY /dev/sd[bcdef] /dev/sdb: ATA ST3000VN000-1H41 SC43 STANDBY not found in Power condition [po] mode page /dev/sdc: ATA WDC WD30EURX-63T 0A80 STANDBY not found in Power condition [po] mode page /dev/sdd: ATA TOSHIBA DT01ACA3 ABB0 STANDBY not found in Power condition [po] mode page /dev/sde: ATA ST3000DM001-1CH1 CC27 STANDBY not found in Power condition [po] mode page /dev/sdf: ATA WDC WD30EFRX-68E 0A80 STANDBY not found in Power condition [po] mode page $ hdparm -B /dev/sd[bcdef] /dev/sdb: APM_level = 254 /dev/sdc: APM_level = not supported /dev/sdd: APM_level = off /dev/sde: APM_level = 254 /dev/sdf: APM_level = not supported $ cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10] md127 : active raid5 sdc1[1] sdd1[2] sdb1[0] sdf1[5] sde1[3] 11719766016 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU] bitmap: 1/22 pages [4KB], 65536KB chunk unused devices: <none> $ for i in {b..f}; do echo "DISK: ${i}"; smartctl -a "/dev/sd${i}" |grep "SMART overall-health self-assessment test result"; done DISK: b SMART overall-health self-assessment test result: PASSED DISK: c SMART overall-health self-assessment test result: PASSED DISK: d SMART overall-health self-assessment test result: PASSED DISK: e SMART overall-health self-assessment test result: PASSED DISK: f SMART overall-health self-assessment test result: PASSED