I have confirmation from a user who has done verification for this kernel. Changing to verification-done-bionic.
** Tags removed: verification-needed-bionic ** Tags added: verification-done-bionic -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1783246 Title: Cephfs + fscache: unable to handle kernel NULL pointer dereference at 0000000000000000 IP: jbd2__journal_start+0x22/0x1f0 Status in linux package in Ubuntu: Confirmed Status in linux source package in Bionic: Fix Committed Bug description: SRU Justification ----------------- [Impact] Certain sequences of file system operations on a cephfs volume backed by fscache with an ext4 store can cause a kernel BUG: [ 5818.932770] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 [ 5818.934354] IP: jbd2__journal_start+0x33/0x1e0 ... [ 5818.962490] Call Trace: [ 5818.963055] ? ext4_writepages+0x5d5/0xf40 [ 5818.963884] __ext4_journal_start_sb+0x6d/0x120 [ 5818.964994] ext4_writepages+0x5d5/0xf40 [ 5818.965991] ? __enqueue_entity+0x5c/0x60 [ 5818.966791] ? check_preempt_wakeup+0x130/0x240 [ 5818.967679] do_writepages+0x4b/0xe0 [ 5818.968625] ? ext4_mark_inode_dirty+0x1d0/0x1d0 [ 5818.969526] ? do_writepages+0x4b/0xe0 [ 5818.970493] ? ext4_statfs+0x114/0x260 [ 5818.971267] __filemap_fdatawrite_range+0xc1/0x100 [ 5818.972425] ? __filemap_fdatawrite_range+0xc1/0x100 [ 5818.973385] filemap_write_and_wait+0x31/0x90 [ 5818.974461] ext4_bmap+0x8c/0xe0 [ 5818.975150] cachefiles_read_or_alloc_pages+0x1bf/0xd90 [cachefiles] [ 5818.976718] ? _cond_resched+0x19/0x40 [ 5818.977482] ? wake_up_bit+0x42/0x50 [ 5818.978227] ? fscache_run_op.isra.8+0x4c/0x80 [fscache] [ 5818.979249] __fscache_read_or_alloc_pages+0x1d3/0x2e0 [fscache] [ 5818.980397] ceph_readpages_from_fscache+0x6c/0xe0 [ceph] [ 5818.981630] ceph_readpages+0x49/0x100 [ceph] [ 5818.982691] __do_page_cache_readahead+0x1c9/0x2c0 [ 5818.983628] ? __cap_is_valid+0x21/0xb0 [ceph] [ 5818.984526] ondemand_readahead+0x11a/0x2a0 [ 5818.985374] ? ondemand_readahead+0x11a/0x2a0 [ 5818.986825] page_cache_async_readahead+0x71/0x80 [ 5818.987751] generic_file_read_iter+0x784/0xbf0 [ 5818.988663] ? ceph_put_cap_refs+0x1c4/0x330 [ceph] [ 5818.989620] ? page_cache_tree_insert+0xe0/0xe0 [ 5818.990519] ceph_read_iter+0x106/0x820 [ceph] [ 5818.991818] new_sync_read+0xe4/0x130 [ 5818.992588] __vfs_read+0x29/0x40 [ 5818.993504] vfs_read+0x8e/0x130 [ 5818.994192] SyS_read+0x55/0xc0 [ 5818.994870] do_syscall_64+0x73/0x130 [ 5818.995632] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [Fix] Cherry-pick 5d988308283ecf062fa88f20ae05c52cce0bcdca from upstream. This patch stops cephfs from reusing current->journal for its own internal use, which means that it's valid when ext4 uses it via fscache. [Testcase] A user has been using the following test case: ( cat /proc/fs/fscache/stats > ~/test.log; i=0; while true; do touch small; echo 3 > /proc/sys/vm/drop_caches & md5sum small; let "i++"; if ! (( $i % 1000 )); then echo "Test iteration $i done" >> ~/test.log; cat /proc/fs/fscache/stats >> ~/test.log; fi; done ) > ~/nohup.out 2>&1 (It boils down to "touch file; drop caches; read file") Without the patch, this fails very quickly - usually the first time, always within a few iterations. With the patch, the user ran this loop for over 60 hours without incident. [Regression potential] The change is not trivial, but is limited to cephfs, and has been in mainline since v4.16. So the risk of regression is well contained. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1783246/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp