Package: nfs-kernel-server Version: 1:1.2.8-9 Severity: critical Justification: breaks the whole system
Dear Maintainer, after writing to an NFS which make the underlying ext4 file system full (no more available space), nfsd crashes with this trace: Mai 30 02:50:16 itmserver2 kernel: INFO: task nfsd:4242 blocked for more than 120 seconds. Mai 30 02:50:16 itmserver2 kernel: Not tainted 3.16.0-4-amd64 #1 Mai 30 02:50:16 itmserver2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mai 30 02:50:16 itmserver2 kernel: nfsd D ffff8807f0202f780 4242 2 0x00000000 Mai 30 02:50:16 itmserver2 kernel: ffff8807f0202b20 0000000000000046 0000000000012f40 ffff8807ae473fd8 Mai 30 02:50:16 itmserver2 kernel: 0000000000012f40 ffff8807f0202b20 ffff8803dd0b4940 ffff8807ae473a80 Mai 30 02:50:16 itmserver2 kernel: ffff8803dd0b4944 ffff8807f0202b20 00000000ffffffff ffff8803dd0b4948 Mai 30 02:50:16 itmserver2 kernel: Call Trace: Mai 30 02:50:16 itmserver2 kernel: [<ffffffff81517aa5>] ?schedule_preempt_disabled+0x25/0x70 Mai 30 02:50:16 itmserver2 kernel: [<ffffffff81519503>] ?__mutex_lock_slowpath+0xd3/0x1c0 Mai 30 02:50:16 itmserver2 kernel: [<ffffffff8151960b>] ?mutex_lock+0x1b/0x2a Mai 30 02:50:16 itmserver2 kernel: [<ffffffffa02d7ac9>] ?ext4_file_write_iter+0x79/0x3a0 [ext4] Mai 30 02:50:16 itmserver2 kernel: [<ffffffff81190384>] ?cache_grow+0x154/0x240 Mai 30 02:50:16 itmserver2 kernel: [<ffffffff811aa420>] ?new_sync_read+0xa0/0xa0 Mai 30 02:50:16 itmserver2 kernel: [<ffffffff811aa51f>] ?do_iter_readv_writev+0x5f/0x90 Mai 30 02:50:16 itmserver2 kernel: [<ffffffff811aba7b>] ?do_readv_writev+0xbb/0x240 Mai 30 02:50:16 itmserver2 kernel: [<ffffffffa02d7a50>] ?ext4_unwritten_wait+0xa0/0xa0 [ext4] Mai 30 02:50:16 itmserver2 kernel: [<ffffffffa02d7a50>] ?ext4_unwritten_wait+0xa0/0xa0 [ext4] Mai 30 02:50:16 itmserver2 kernel: [<ffffffff8121ff55>] ?exportfs_decode_fh+0x95/0x2c0 Mai 30 02:50:16 itmserver2 kernel: [<ffffffff8108faff>] ?groups_alloc+0x2f/0xe0 Mai 30 02:50:16 itmserver2 kernel: [<ffffffffa06298b9>] ?nfsd_vfs_write.isra.12+0x99/0x360 [nfsd] Mai 30 02:50:16 itmserver2 kernel: [<ffffffffa062cf59>] ?nfsd_write+0x89/0x110 [nfsd] Mai 30 02:50:16 itmserver2 kernel: [<ffffffffa06371ef>] ?nfsd4_write+0x1bf/0x220 [nfsd] Mai 30 02:50:16 itmserver2 kernel: [<ffffffffa0638b18>] ?nfsd4_proc_compound+0x4e8/0x7e0 [nfsd] Mai 30 02:50:16 itmserver2 kernel: [<ffffffffa0625d32>] ?nfsd_dispatch+0xb2/0x200 [nfsd] Mai 30 02:50:16 itmserver2 kernel: [<ffffffffa05d5d71>] ?svc_process_common+0x451/0x6e0 [sunrpc] Mai 30 02:50:16 itmserver2 kernel: [<ffffffffa0625630>] ?nfsd_destroy+0x70/0x70 [nfsd] Mai 30 02:50:16 itmserver2 kernel: [<ffffffffa05d610c>] ?svc_process+0x10c/0x160 [sunrpc] Mai 30 02:50:16 itmserver2 kernel: [<ffffffffa06256ef>] ?nfsd+0xbf/0x130 [nfsd] Mai 30 02:50:16 itmserver2 kernel: [<ffffffff810894fd>] ?kthread+0xbd/0xe0 Mai 30 02:50:16 itmserver2 kernel: [<ffffffff81089440>] ?kthread_create_on_node+0x180/0x180 Mai 30 02:50:16 itmserver2 kernel: [<ffffffff8151ad98>] ?ret_from_fork+0x58/0x90 Mai 30 02:50:16 itmserver2 kernel: [<ffffffff81089440>] ?kthread_create_on_node+0x180/0x180 The filesystem in question: dumpe2fs 1.42.12 (29-Aug-2014) Filesystem volume name: space Last mounted on: /srv/space Filesystem UUID: ba1e6b57-d1ae-458f-850c-32894eb37a97 Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize Filesystem flags: signed_directory_hash Default mount options: user_xattr acl Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 536870912 Block count: 4294967295 Reserved block count: 375809 Free blocks: 17076270 Free inodes: 533985328 First block: 0 Block size: 4096 Fragment size: 4096 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 4096 Inode blocks per group: 256 RAID stride: 128 RAID stripe width: 1280 Flex block group size: 16 Filesystem created: Wed May 27 13:50:23 2015 Last mount time: Tue May 30 09:50:51 2017 Last write time: Tue May 30 09:50:51 2017 Mount count: 1 Maximum mount count: -1 Last checked: Tue May 30 09:33:30 2017 Check interval: 0 (<none>) Lifetime writes: 219 TB Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 256 Required extra isize: 28 Desired extra isize: 28 Journal inode: 8 Default directory hash: half_md4 Directory Hash Seed: b2a64a16-0afc-40aa-8a92-a5a65fca1d5a Journal backup: inode blocks Journal features: journal_incompat_revoke Journal size: 128M Journal length: 32768 Journal sequence: 0x007ab78e Journal start: 31676 Please notice the size of 16 TB which is the maximum with 32 Bit and block size of 4096. Maybe this is important. The filesystem was created on a logical volume --- Logical volume --- LV Path /dev/md0_VG/space LV Name space VG Name md0_VG LV UUID 63JVuX-yu5E-qsrw-zCP0-X0S8-Erkv-ZYTxwk LV Write Access read/write LV Creation host, time itmserver2, 2015-05-27 13:23:24 +0200 LV Status available # open 1 LV Size 17.00 TiB Current LE 34816 Segments 2 Allocation inherit Read ahead sectors auto - currently set to 20480 Block device 254:1 and resized serveral times online. The LV is part of a VG --- Volume group --- VG Name md0_VG System ID Format lvm2 Metadata Areas 1 Metadata Sequence No 59 VG Access read/write VG Status resizable MAX LV 0 Cur LV 5 Open LV 5 Max PV 0 Cur PV 1 Act PV 1 VG Size 36.39 TiB PE Size 512.00 MiB Total PE 74517 Alloc PE / Size 52324 / 25.55 TiB Free PE / Size 22193 / 10.84 TiB VG UUID ARRCBo-se6D-zNRI-Uz6m-f3Bc-0D2L-PdHGrO which consists of only one physical volume --- Physical volume --- PV Name /dev/md0 VG Name md0_VG PV Size 36.39 TiB / not usable 491.00 MiB Allocatable yes PE Size 512.00 MiB Total PE 74517 Free PE 22193 Allocated PE 52324 PV UUID 7edwwI-7bVb-dDN8-wdXc-Je9R-J5Kn-KvPaXF This physical volume is a RAID 6. Our NFS server uses Debian jessie (old stable) with kernel Linux itmserver2 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2 (2017-04-30) x86_64 GNU/Linux fter the freeze, the system was rendered unstable with weird behavior when accessing the file system. Restarting the NFSD service didn't help and unloading the NFS kernel module wasn't possible since it was still in use by NFS Daemons which were unkillable. Also the command lsof did not respond. Only a restart of the server was possible at this time, why the bug is reported as critical. Before the restart, we noticed high CPU of the process kworker/0:3 and high I/O (we used iotop) of jdb2/sda5-8. Notice that /dev/sda5 holds only the /var/ directory and is a completely separate hard disk as the NFS mount. After the restart we checked the file system: itmserver2:~ # fsck -f /dev/md0_VG/space fsck from util-linux 2.25.2 e2fsck 1.42.12 (29-Aug-2014) space: recovering journal Clearing orphaned inode 253823871 (uid=3144, gid=3100, mode=0100640, size=233609) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Free blocks count wrong (13130177, counted=17076270). Fix<y>? yes Free inodes count wrong (533998561, counted=533985328). Fix<y>? yes The freeze happened two times so far. Since it was a productive system which rendered many clients unusable, we did not had time to analyse the problem deeper but restarted the server instead. Each time, a user reported that a script on a client has written large quantities of data to the NFS mount. Indeed, the file system was _almost_ full upon inspection (0.5 % free but the reserved blocks are only 0.01 %), so it may has reached its limit and then aborted writing the last file resulting in an almost full file system? It is assumed that the freeze of nfsd happened when the filesystem limit was reached. The expected outcome would be to raise an I/O error because of the full file system instead of freezing. -- Package-specific info: -- rpcinfo -- program vers proto port service 100000 4 tcp 111 portmapper 100000 3 tcp 111 portmapper 100000 2 tcp 111 portmapper 100000 4 udp 111 portmapper 100000 3 udp 111 portmapper 100000 2 udp 111 portmapper 100021 1 udp 44927 nlockmgr 100021 3 udp 44927 nlockmgr 100021 4 udp 44927 nlockmgr 100021 1 tcp 56490 nlockmgr 100021 3 tcp 56490 nlockmgr 100021 4 tcp 56490 nlockmgr 100007 2 udp 811 ypbind 100007 1 udp 811 ypbind 100007 2 tcp 812 ypbind 100007 1 tcp 812 ypbind 100003 2 tcp 2049 nfs 100003 3 tcp 2049 nfs 100003 4 tcp 2049 nfs 100227 2 tcp 2049 100227 3 tcp 2049 100003 2 udp 2049 nfs 100003 3 udp 2049 nfs 100003 4 udp 2049 nfs 100227 2 udp 2049 100227 3 udp 2049 100005 1 udp 59788 mountd 100005 1 tcp 46566 mountd 100005 2 udp 35896 mountd 100005 2 tcp 52752 mountd 100005 3 udp 59974 mountd 100005 3 tcp 46390 mountd 100024 1 udp 53921 status 100024 1 tcp 59277 status -- /etc/default/nfs-kernel-server -- RPCNFSDCOUNT=8 RPCNFSDPRIORITY=0 RPCMOUNTDOPTS="--manage-gids" NEED_SVCGSSD="" RPCSVCGSSDOPTS="" -- /etc/exports -- [removed for data security reasons] -- /proc/fs/nfs/exports -- # Version 1.1 # Path Client(Flags) # IPs [removed for data security reasons] -- System Information: Debian Release: 8.8 APT prefers oldstable-updates APT policy: (500, 'oldstable-updates'), (500, 'oldstable') Architecture: amd64 (x86_64) Kernel: Linux 3.16.0-4-amd64 (SMP w/4 CPU cores) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) Versions of packages nfs-kernel-server depends on: ii libblkid1 2.25.2-6 ii libc6 2.19-18+deb8u10 ii libcap2 1:2.24-8 ii libsqlite3-0 3.8.7.1-1+deb8u2 ii libtirpc1 0.2.5-1+deb8u1 ii libwrap0 7.6.q-25 ii lsb-base 4.1+Debian13+nmu1 ii nfs-common 1:1.2.8-9 ii ucf 3.0030 nfs-kernel-server recommends no packages. nfs-kernel-server suggests no packages. -- no debconf information