Thanks for your reply (I expected this to go into the ether, since it
seems to be a very common issue, with I/O, on this hardware platform).

I did try switching from LZJB, to LZ4, and got slightly-better
performance, and reliability, from at least testing with backing up a
Windows 10 machine, using WSL, and Borg, over SSH - however, I can still
lock the system up, after several runs, or get it to a stage where I can
send commands, over an open SSH session, but spawning a new one fails,
since it can't access my home directory (which is stored on the ZFS
array).

My last log, with Borg, is:

[   22.536219] bcmgenet: Skipping UMAC reset
[   22.537803] bcmgenet fd580000.ethernet: configuring instance for external 
RGMII
[   22.538029] bcmgenet fd580000.ethernet eth0: Link is Down
[   25.140217] new mount options do not match the existing superblock, will be 
ignored
[   26.751285] brcmfmac: brcmf_cfg80211_set_power_mgmt: power save enabled
[   27.651756] bcmgenet fd580000.ethernet eth0: Link is Up - 1Gbps/Full - flow 
control rx/tx
[   27.651790] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   29.696209] Started bpfilter
[   29.701240] bpfilter: Loaded bpfilter_umh pid 3166
[   29.858264] bridge: filtering via arp/ip/ip6tables is no longer available by 
default. Update your scripts to load br_netfilter if you need this.
[   30.379027] kauditd_printk_skb: 27 callbacks suppressed
[   30.379032] audit: type=1400 audit(1602433969.316:39): apparmor="STATUS" 
operation="profile_replace" info="same as current profile, skipping" 
profile="unconfined" name="/usr/bin/lxc-start" pid=3259 comm="apparmor_parser"
[   30.457589] audit: type=1400 audit(1602433969.396:40): apparmor="STATUS" 
operation="profile_replace" info="same as current profile, skipping" 
profile="unconfined" name="lxc-container-default" pid=3266 
comm="apparmor_parser"
[   30.457603] audit: type=1400 audit(1602433969.396:41): apparmor="STATUS" 
operation="profile_replace" info="same as current profile, skipping" 
profile="unconfined" name="lxc-container-default-cgns" pid=3266 
comm="apparmor_parser"
[   30.457620] audit: type=1400 audit(1602433969.396:42): apparmor="STATUS" 
operation="profile_replace" info="same as current profile, skipping" 
profile="unconfined" name="lxc-container-default-with-mounting" pid=3266 
comm="apparmor_parser"
[   30.457629] audit: type=1400 audit(1602433969.396:43): apparmor="STATUS" 
operation="profile_replace" info="same as current profile, skipping" 
profile="unconfined" name="lxc-container-default-with-nesting" pid=3266 
comm="apparmor_parser"
[   36.556387] Bluetooth: BNEP (Ethernet Emulation) ver 1.3
[   36.556393] Bluetooth: BNEP filters: protocol multicast
[   36.556407] Bluetooth: BNEP socket layer initialized
[   74.720044] Adding 16777212k swap on /dev/zd0.  Priority:-2 extents:1 
across:16777212k SSFS
[10030.116676] INFO: task z_rd_int:2154 blocked for more than 120 seconds.
[10030.123432]       Tainted: P         C OE     5.4.0-1018-raspi #20-Ubuntu
[10030.130389] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[10030.138388] z_rd_int        D    0  2154      2 0x00000028
[10030.138403] Call trace:
[10030.138424]  __switch_to+0x104/0x170
[10030.138435]  __schedule+0x30c/0x7c0
[10030.138444]  schedule+0x3c/0xb8
[10030.138453]  io_schedule+0x20/0x58
[10030.138466]  rq_qos_wait+0x100/0x178
[10030.138475]  wbt_wait+0xb4/0xf0
[10030.138485]  __rq_qos_throttle+0x38/0x50
[10030.138497]  blk_mq_make_request+0x128/0x610
[10030.138509]  generic_make_request+0xb4/0x2d8
[10030.138519]  submit_bio+0x48/0x218
[10030.138756]  vdev_disk_io_start+0x670/0x9f8 [zfs]
[10030.138977]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[10030.139197]  zio_execute+0xac/0x110 [zfs]
[10030.139414]  vdev_queue_io_done+0x18c/0x2a0 [zfs]
[10030.139632]  zio_vdev_io_done+0xec/0x220 [zfs]
[10030.139849]  zio_execute+0xac/0x110 [zfs]
[10030.139891]  taskq_thread+0x2f8/0x570 [spl]
[10030.139904]  kthread+0xfc/0x128
[10030.139915]  ret_from_fork+0x10/0x1c
[10030.139928] INFO: task z_rd_int:2158 blocked for more than 120 seconds.
[10030.146692]       Tainted: P         C OE     5.4.0-1018-raspi #20-Ubuntu
[10030.153619] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[10030.161608] z_rd_int        D    0  2158      2 0x00000028
[10030.161622] Call trace:
[10030.161643]  __switch_to+0x104/0x170
[10030.161655]  __schedule+0x30c/0x7c0
[10030.161663]  schedule+0x3c/0xb8
[10030.161673]  io_schedule+0x20/0x58
[10030.161686]  rq_qos_wait+0x100/0x178
[10030.161694]  wbt_wait+0xb4/0xf0
[10030.161705]  __rq_qos_throttle+0x38/0x50
[10030.161717]  blk_mq_make_request+0x128/0x610
[10030.161729]  generic_make_request+0xb4/0x2d8
[10030.161739]  submit_bio+0x48/0x218
[10030.161965]  vdev_disk_io_start+0x670/0x9f8 [zfs]
[10030.162186]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[10030.162416]  zio_nowait+0xd4/0x170 [zfs]
[10030.162633]  vdev_queue_io_done+0x1ec/0x2a0 [zfs]
[10030.162859]  zio_vdev_io_done+0xec/0x220 [zfs]
[10030.163078]  zio_execute+0xac/0x110 [zfs]
[10030.163123]  taskq_thread+0x2f8/0x570 [spl]
[10030.163136]  kthread+0xfc/0x128
[10030.163148]  ret_from_fork+0x10/0x1c
[10030.163160] INFO: task z_wr_iss:2163 blocked for more than 120 seconds.
[10030.169921]       Tainted: P         C OE     5.4.0-1018-raspi #20-Ubuntu
[10030.176850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[10030.184833] z_wr_iss        D    0  2163      2 0x00000028
[10030.184847] Call trace:
[10030.184861]  __switch_to+0x104/0x170
[10030.184871]  __schedule+0x30c/0x7c0
[10030.184879]  schedule+0x3c/0xb8
[10030.184887]  io_schedule+0x20/0x58
[10030.184900]  rq_qos_wait+0x100/0x178
[10030.184908]  wbt_wait+0xb4/0xf0
[10030.184918]  __rq_qos_throttle+0x38/0x50
[10030.184930]  blk_mq_make_request+0x128/0x610
[10030.184942]  generic_make_request+0xb4/0x2d8
[10030.184952]  submit_bio+0x48/0x218
[10030.185175]  vdev_disk_io_start+0x670/0x9f8 [zfs]
[10030.185392]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[10030.185609]  zio_nowait+0xd4/0x170 [zfs]
[10030.185825]  vdev_queue_io+0x23c/0x258 [zfs]
[10030.186043]  zio_vdev_io_start+0x94/0x2b8 [zfs]
[10030.186260]  zio_nowait+0xd4/0x170 [zfs]
[10030.186476]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
[10030.186693]  zio_vdev_io_start+0x248/0x2b8 [zfs]
[10030.186910]  zio_execute+0xac/0x110 [zfs]
[10030.186948]  taskq_thread+0x2f8/0x570 [spl]
[10030.186960]  kthread+0xfc/0x128
[10030.186971]  ret_from_fork+0x10/0x1c
[10030.186984] INFO: task z_wr_int:2172 blocked for more than 120 seconds.
[10030.193743]       Tainted: P         C OE     5.4.0-1018-raspi #20-Ubuntu
[10030.200666] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[10030.208657] z_wr_int        D    0  2172      2 0x00000028
[10030.208671] Call trace:
[10030.208685]  __switch_to+0x104/0x170
[10030.208696]  __schedule+0x30c/0x7c0
[10030.208710]  schedule+0x3c/0xb8
[10030.208718]  io_schedule+0x20/0x58
[10030.208730]  rq_qos_wait+0x100/0x178
[10030.208742]  wbt_wait+0xb4/0xf0
[10030.208753]  __rq_qos_throttle+0x38/0x50
[10030.208764]  blk_mq_make_request+0x128/0x610
[10030.208776]  generic_make_request+0xb4/0x2d8
[10030.208787]  submit_bio+0x48/0x218
[10030.209052]  vdev_disk_io_start+0x670/0x9f8 [zfs]
[10030.209273]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[10030.209503]  zio_nowait+0xd4/0x170 [zfs]
[10030.209720]  vdev_queue_io_done+0x1ec/0x2a0 [zfs]
[10030.209948]  zio_vdev_io_done+0xec/0x220 [zfs]
[10030.210166]  zio_execute+0xac/0x110 [zfs]
[10030.210212]  taskq_thread+0x2f8/0x570 [spl]
[10030.210225]  kthread+0xfc/0x128
[10030.210236]  ret_from_fork+0x10/0x1c
[10030.210262] INFO: task txg_sync:2333 blocked for more than 120 seconds.
[10030.217024]       Tainted: P         C OE     5.4.0-1018-raspi #20-Ubuntu
[10030.223955] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[10030.231946] txg_sync        D    0  2333      2 0x00000028
[10030.231960] Call trace:
[10030.231975]  __switch_to+0x104/0x170
[10030.231985]  __schedule+0x30c/0x7c0
[10030.231993]  schedule+0x3c/0xb8
[10030.232003]  schedule_timeout+0x9c/0x190
[10030.232011]  io_schedule_timeout+0x28/0x48
[10030.232050]  __cv_timedwait_common+0x1a8/0x1f8 [spl]
[10030.232084]  __cv_timedwait_io+0x3c/0x50 [spl]
[10030.232305]  zio_wait+0x130/0x2a0 [zfs]
[10030.232524]  dsl_pool_sync+0x3fc/0x498 [zfs]
[10030.232741]  spa_sync+0x538/0xe68 [zfs]
[10030.232997]  txg_sync_thread+0x2c0/0x468 [zfs]
[10030.233042]  thread_generic_wrapper+0x74/0xa0 [spl]
[10030.233057]  kthread+0xfc/0x128
[10030.233068]  ret_from_fork+0x10/0x1c
[10030.233229] INFO: task borg:6275 blocked for more than 120 seconds.
[10030.239643]       Tainted: P         C OE     5.4.0-1018-raspi #20-Ubuntu
[10030.246553] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[10030.254520] borg            D    0  6275   6274 0x00000000
[10030.254529] Call trace:
[10030.254538]  __switch_to+0x104/0x170
[10030.254545]  __schedule+0x30c/0x7c0
[10030.254550]  schedule+0x3c/0xb8
[10030.254556]  schedule_timeout+0x9c/0x190
[10030.254561]  io_schedule_timeout+0x28/0x48
[10030.254586]  __cv_timedwait_common+0x1a8/0x1f8 [spl]
[10030.254606]  __cv_timedwait_io+0x3c/0x50 [spl]
[10030.254741]  zio_wait+0x130/0x2a0 [zfs]
[10030.254871]  dbuf_read+0x2e4/0xbb8 [zfs]
[10030.255001]  dmu_tx_check_ioerr+0x80/0xf0 [zfs]
[10030.255130]  dmu_tx_hold_zap_impl+0x84/0xc8 [zfs]
[10030.255259]  dmu_tx_hold_zap+0x50/0x60 [zfs]
[10030.255387]  dmu_tx_hold_sa_create+0x108/0x110 [zfs]
[10030.255518]  zfs_create+0x1d0/0x7d0 [zfs]
[10030.255649]  zpl_create+0xb8/0x178 [zfs]
[10030.255657]  lookup_open+0x4ec/0x6a8
[10030.255663]  do_last+0x260/0x8c0
[10030.255669]  path_openat+0x84/0x258
[10030.255675]  do_filp_open+0x84/0x108
[10030.255682]  do_sys_open+0x180/0x2b0
[10030.255689]  __arm64_sys_openat+0x2c/0x38
[10030.255695]  el0_svc_common.constprop.0+0x80/0x218
[10030.255700]  el0_svc_handler+0x34/0xa0
[10030.255706]  el0_svc+0x10/0x2cc
[10030.255744] INFO: task du:441566 blocked for more than 120 seconds.
[10030.262133]       Tainted: P         C OE     5.4.0-1018-raspi #20-Ubuntu
[10030.269041] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[10030.277009] du              D    0 441566   3688 0x00000000
[10030.277018] Call trace:
[10030.277030]  __switch_to+0x104/0x170
[10030.277036]  __schedule+0x30c/0x7c0
[10030.277041]  schedule+0x3c/0xb8
[10030.277048]  schedule_timeout+0x9c/0x190
[10030.277053]  io_schedule_timeout+0x28/0x48
[10030.277078]  __cv_timedwait_common+0x1a8/0x1f8 [spl]
[10030.277099]  __cv_timedwait_io+0x3c/0x50 [spl]
[10030.277236]  zio_wait+0x130/0x2a0 [zfs]
[10030.277367]  dbuf_read+0x2e4/0xbb8 [zfs]
[10030.277507]  dmu_buf_hold+0xa4/0xd8 [zfs]
[10030.277638]  zap_lockdir+0x70/0xe8 [zfs]
[10030.277774]  zap_cursor_retrieve+0x164/0x2c8 [zfs]
[10030.277905]  zfs_readdir+0x124/0x398 [zfs]
[10030.278040]  zpl_iterate+0x50/0x80 [zfs]
[10030.278049]  iterate_dir+0x14c/0x1a0
[10030.278055]  ksys_getdents64+0xac/0x1f8
[10030.278066]  __arm64_sys_getdents64+0x28/0xd8
[10030.278072]  el0_svc_common.constprop.0+0x80/0x218
[10030.278077]  el0_svc_handler+0x34/0xa0
[10030.278083]  el0_svc+0x10/0x2cc
[10150.949406] INFO: task z_rd_int:2154 blocked for more than 241 seconds.
[10150.956133]       Tainted: P         C OE     5.4.0-1018-raspi #20-Ubuntu
[10150.963040] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[10150.970998] z_rd_int        D    0  2154      2 0x00000028
[10150.971005] Call trace:
[10150.971017]  __switch_to+0x104/0x170
[10150.971024]  __schedule+0x30c/0x7c0
[10150.971027]  schedule+0x3c/0xb8
[10150.971031]  io_schedule+0x20/0x58
[10150.971037]  rq_qos_wait+0x100/0x178
[10150.971041]  wbt_wait+0xb4/0xf0
[10150.971045]  __rq_qos_throttle+0x38/0x50
[10150.971050]  blk_mq_make_request+0x128/0x610
[10150.971056]  generic_make_request+0xb4/0x2d8
[10150.971060]  submit_bio+0x48/0x218
[10150.971217]  vdev_disk_io_start+0x670/0x9f8 [zfs]
[10150.971307]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[10150.971396]  zio_execute+0xac/0x110 [zfs]
[10150.971484]  vdev_queue_io_done+0x18c/0x2a0 [zfs]
[10150.971572]  zio_vdev_io_done+0xec/0x220 [zfs]
[10150.971660]  zio_execute+0xac/0x110 [zfs]
[10150.971677]  taskq_thread+0x2f8/0x570 [spl]
[10150.971684]  kthread+0xfc/0x128
[10150.971689]  ret_from_fork+0x10/0x1c
[10150.971695] INFO: task z_rd_int:2158 blocked for more than 241 seconds.
[10150.978427]       Tainted: P         C OE     5.4.0-1018-raspi #20-Ubuntu
[10150.985323] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[10150.993278] z_rd_int        D    0  2158      2 0x00000028
[10150.993285] Call trace:
[10150.993293]  __switch_to+0x104/0x170
[10150.993300]  __schedule+0x30c/0x7c0
[10150.993303]  schedule+0x3c/0xb8
[10150.993308]  io_schedule+0x20/0x58
[10150.993315]  rq_qos_wait+0x100/0x178
[10150.993318]  wbt_wait+0xb4/0xf0
[10150.993322]  __rq_qos_throttle+0x38/0x50
[10150.993328]  blk_mq_make_request+0x128/0x610
[10150.993333]  generic_make_request+0xb4/0x2d8
[10150.993337]  submit_bio+0x48/0x218
[10150.993438]  vdev_disk_io_start+0x670/0x9f8 [zfs]
[10150.993545]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[10150.993636]  zio_nowait+0xd4/0x170 [zfs]
[10150.993727]  vdev_queue_io_done+0x1ec/0x2a0 [zfs]
[10150.993815]  zio_vdev_io_done+0xec/0x220 [zfs]
[10150.993906]  zio_execute+0xac/0x110 [zfs]
[10150.993923]  taskq_thread+0x2f8/0x570 [spl]
[10150.993933]  kthread+0xfc/0x128
[10150.993937]  ret_from_fork+0x10/0x1c
[10150.993944] INFO: task z_wr_iss:2163 blocked for more than 241 seconds.
[10151.000674]       Tainted: P         C OE     5.4.0-1018-raspi #20-Ubuntu
[10151.007573] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[10151.015527] z_wr_iss        D    0  2163      2 0x00000028
[10151.015533] Call trace:
[10151.015541]  __switch_to+0x104/0x170
[10151.015545]  __schedule+0x30c/0x7c0
[10151.015549]  schedule+0x3c/0xb8
[10151.015552]  io_schedule+0x20/0x58
[10151.015558]  rq_qos_wait+0x100/0x178
[10151.015561]  wbt_wait+0xb4/0xf0
[10151.015565]  __rq_qos_throttle+0x38/0x50
[10151.015570]  blk_mq_make_request+0x128/0x610
[10151.015575]  generic_make_request+0xb4/0x2d8
[10151.015579]  submit_bio+0x48/0x218
[10151.015677]  vdev_disk_io_start+0x670/0x9f8 [zfs]
[10151.015765]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[10151.015852]  zio_nowait+0xd4/0x170 [zfs]
[10151.015939]  vdev_queue_io+0x23c/0x258 [zfs]
[10151.016031]  zio_vdev_io_start+0x94/0x2b8 [zfs]
[10151.016118]  zio_nowait+0xd4/0x170 [zfs]
[10151.016205]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
[10151.016292]  zio_vdev_io_start+0x248/0x2b8 [zfs]
[10151.016379]  zio_execute+0xac/0x110 [zfs]
[10151.016395]  taskq_thread+0x2f8/0x570 [spl]
[10151.016401]  kthread+0xfc/0x128
[10151.016405]  ret_from_fork+0x10/0x1c.

Interestingly, after changing compression algorithm, it starts to use
the swap ZVOL, for the first time, which gives a little more runway,
until it fails.

Running your ZDB command gives me:
tyson@ubuntu:~$ sudo zdb -s Yaesu
[sudo] password for tyson:
                            capacity   operations   bandwidth  ---- errors ----
description                used avail  read write  read write  read write cksum
Yaesu                     8.49T 2.39T 2.88K     0 53.5M     0     0     0     0
  /dev/disk/by-id/usb-QNAP_TR-004_DISK00_51323032493034393139-0:0-part1         
                                      3.50T  132G   668     0 7.14M     0     0 
    0     0
  /dev/disk/by-id/usb-QNAP_TR-004_DISK01_51323032493034393139-0:1-part1         
                                      2.29T 1.33T   884     0 24.6M     0     0 
    0     0
  /dev/disk/by-id/usb-QNAP_TR-004_DISK02_51323032493034393139-0:2-part1         
                                      1.44T  383G   706     0 9.76M     0     0 
    0     0
  /dev/disk/by-id/usb-QNAP_TR-004_DISK03_51323032493034393139-0:3-part1         
                                      1.26T  565G   686     0 12.0M     0     0 
    0     0

Sadly, I haven't got a drive large enough, to hold all of the data in
the pool (around 11TB, deduplicated/compressed, and with snapshots!), to
try imaging all of the disks, and dumping it into a new pool, without
deduplication enabled. :(

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1899249

Title:
  OpenZFS writing stalls, under load

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1899249/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to