After a few hours with the reproducer running on the original kernel, the kernel errors about the reference count are observed:
Focal: ----- $ uname -rv 5.4.0-38-generic #42-Ubuntu SMP Mon Jun 8 14:14:24 UTC 2020 $ ./aa-refcnt-af_alg <a few hours later> [ 9581.048189] ------------[ cut here ]------------ [ 9581.049497] refcount_t overflow at apparmor_sk_clone_security+0x35/0x70 in aa-refcnt-af_al[1023], uid/euid: 1000/1000 [ 9581.052125] WARNING: CPU: 1 PID: 1023 at kernel/panic.c:677 refcount_error_report+0x9b/0xab [ 9581.054428] Modules linked in: ... [ 9581.063137] CPU: 1 PID: 1023 Comm: aa-refcnt-af_al Tainted: G OE 5.4.0-38-generic #42-Ubuntu [ 9581.065494] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 9581.067693] RIP: 0010:refcount_error_report+0x9b/0xab ... [ 9581.088358] Call Trace: [ 9581.089083] ex_handler_refcount+0x50/0x70 [ 9581.090147] fixup_exception+0x4a/0x61 [ 9581.091142] do_trap+0x4e/0xf0 [ 9581.091998] do_error_trap+0x7c/0xc0 [ 9581.092958] ? csum_partial_copy_generic+0x1687/0x3a10 [ 9581.094250] do_invalid_op+0x3c/0x50 [ 9581.095210] ? csum_partial_copy_generic+0x1687/0x3a10 [ 9581.096505] invalid_op+0x1e/0x30 [ 9581.097413] RIP: 0010:apparmor_sk_clone_security+0x35/0x70 ... [ 9581.113048] security_sk_clone+0x2f/0x40 [ 9581.114078] af_alg_accept+0x7e/0x190 [af_alg] [ 9581.115456] alg_accept+0x15/0x20 [af_alg] [ 9581.116549] __sys_accept4+0x109/0x210 [ 9581.117549] ? _cond_resched+0x19/0x30 [ 9581.118545] __x64_sys_accept+0x1c/0x20 [ 9581.119573] do_syscall_64+0x57/0x190 [ 9581.120551] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 9581.121821] RIP: 0033:0x7efc1bc390a7 ... Bionic: ------ $ uname -rv 4.15.0-107-generic #108-Ubuntu SMP Mon Jun 8 17:51:33 UTC 2020 $ ./aa-refcnt-af_alg <a few hours later> [ 8460.359291] ------------[ cut here ]------------ [ 8460.360638] refcount_t overflow at apparmor_sk_clone_security+0x37/0x70 in aa-refcnt-af_al[1243], uid/euid: 1000/1000 [ 8460.363332] WARNING: CPU: 1 PID: 1243 at /build/linux-oHXYZI/linux-4.15.0/kernel/panic.c:662 refcount_error_report+0x9c/0xac [ 8460.366556] Modules linked in: ... [ 8460.375936] CPU: 1 PID: 1243 Comm: aa-refcnt-af_al Tainted: G OE 4.15.0-107-generic #108-Ubuntu [ 8460.378352] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 8460.380598] RIP: 0010:refcount_error_report+0x9c/0xac ... [ 8460.397294] Call Trace: [ 8460.398331] ex_handler_refcount+0x52/0x80 [ 8460.399432] fixup_exception+0x3a/0x50 [ 8460.400462] do_trap+0x8a/0x140 [ 8460.401346] do_error_trap+0xa6/0x140 [ 8460.402355] ? csum_partial_copy_generic+0xcfb/0x27a0 [ 8460.403671] ? ___slab_alloc+0x204/0x4f0 [ 8460.404730] ? ___slab_alloc+0x204/0x4f0 [ 8460.405786] ? get_empty_filp+0x5c/0x1c0 [ 8460.406840] do_invalid_op+0x20/0x30 [ 8460.407830] invalid_op+0x1b/0x40 [ 8460.408755] RIP: 0010:apparmor_sk_clone_security+0x37/0x70 ... [ 8460.420262] security_sk_clone+0x33/0x50 [ 8460.421314] af_alg_accept+0x81/0x1c0 [af_alg] [ 8460.422484] ? aa_sock_accept_perm+0x25/0x30 [ 8460.423623] alg_accept+0x15/0x20 [af_alg] [ 8460.424725] SYSC_accept4+0xff/0x210 [ 8460.425706] ? mntput+0x24/0x40 [ 8460.426598] ? __fput+0x193/0x220 [ 8460.427536] ? _cond_resched+0x19/0x40 [ 8460.428561] ? task_work_run+0x46/0xc0 [ 8460.429586] SyS_accept+0x10/0x20 [ 8460.430518] do_syscall_64+0x73/0x130 [ 8460.431522] entry_SYSCALL_64_after_hwframe+0x41/0xa6 [ 8460.432830] RIP: 0033:0x7f0ecc0c87e4 ... ** Description changed: [Impact] * Users of the Crypto (user-space) API (i.e., AF_ALG) can trigger refcount errors in AppArmor under high load (might lead to memory leak or use after free.) * There is a reference leak in AppArmor when af_alg_accept() calls security_sock_graft() and then security_sk_clone(). * Both acquire a reference to a label, to assign it to the same pointer, but the latter does not release the former's acquired reference (before overwriting the pointer value.) * This reference leak builds up over time, and under high load can eventually overflow/underflow/saturate refcount, depending on which value it has when a program hits that. * The fix just checks if the pointer has an assigned label, then releases its acquired reference. [Test Case] + * See comment #1 for the test-case 'aa-refcnt-af_alg.c'. + * Exercise that code path indefinitely until it hits the refcount_t overflow/underflow/saturate message - (or not, with the patch.) + (or not, with the patch.) (see comment #4) - * See comment #1 for the test-case 'aa-refcnt-af_alg.c'. - - If the problem happens, in a few hours there is an - error message in the kernel logs (see comment #1.) + If the problem happens, in a few hours there is an + error message in the kernel logs (see comment #1.) * It's possible to monitor refcount values with kprobes, - to confirm whether or not the problem is happening. + to confirm whether or not the problem is happening. + (see comments #2 and #3) [Other Info] * Patch applied upstream on v5.8-rc1 [1] * Applied on Unstable (tag Ubuntu-5.8-5.8.0-0.1) * Not required on Groovy (still 5.4; should sync from Unstable) * Not required on Eoan (EOL date before SRU cycle release date) * Required on Bionic and Focal. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=3b646abc5bc6c0df649daea4c2c976bd4d47e4c8 ** Description changed: [Impact] * Users of the Crypto (user-space) API (i.e., AF_ALG) can trigger refcount errors in AppArmor under high load (might lead to memory leak or use after free.) * There is a reference leak in AppArmor when af_alg_accept() calls security_sock_graft() and then security_sk_clone(). * Both acquire a reference to a label, to assign it to the same pointer, but the latter does not release the former's acquired reference (before overwriting the pointer value.) * This reference leak builds up over time, and under high load can eventually overflow/underflow/saturate refcount, depending on which value it has when a program hits that. * The fix just checks if the pointer has an assigned label, then releases its acquired reference. [Test Case] * See comment #1 for the test-case 'aa-refcnt-af_alg.c'. - * Exercise that code path indefinitely until it hits - the refcount_t overflow/underflow/saturate message - (or not, with the patch.) (see comment #4) - - If the problem happens, in a few hours there is an - error message in the kernel logs (see comment #1.) + * Exercise that code path indefinitely until it hits + the refcount_t overflow/underflow/saturate message + (or not, with the patch.) (see comment #4) * It's possible to monitor refcount values with kprobes, to confirm whether or not the problem is happening. - (see comments #2 and #3) + (see comments #2 and #3) [Other Info] * Patch applied upstream on v5.8-rc1 [1] * Applied on Unstable (tag Ubuntu-5.8-5.8.0-0.1) * Not required on Groovy (still 5.4; should sync from Unstable) * Not required on Eoan (EOL date before SRU cycle release date) * Required on Bionic and Focal. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=3b646abc5bc6c0df649daea4c2c976bd4d47e4c8 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1883962 Title: apparmor reference leak causes refcount_t overflow with af_alg_accept() Status in linux package in Ubuntu: Fix Committed Status in linux source package in Bionic: In Progress Status in linux source package in Eoan: Won't Fix Status in linux source package in Focal: In Progress Status in linux source package in Groovy: Won't Fix Bug description: [Impact] * Users of the Crypto (user-space) API (i.e., AF_ALG) can trigger refcount errors in AppArmor under high load (might lead to memory leak or use after free.) * There is a reference leak in AppArmor when af_alg_accept() calls security_sock_graft() and then security_sk_clone(). * Both acquire a reference to a label, to assign it to the same pointer, but the latter does not release the former's acquired reference (before overwriting the pointer value.) * This reference leak builds up over time, and under high load can eventually overflow/underflow/saturate refcount, depending on which value it has when a program hits that. * The fix just checks if the pointer has an assigned label, then releases its acquired reference. [Test Case] * See comment #1 for the test-case 'aa-refcnt-af_alg.c'. * Exercise that code path indefinitely until it hits the refcount_t overflow/underflow/saturate message (or not, with the patch.) (see comment #4) * It's possible to monitor refcount values with kprobes, to confirm whether or not the problem is happening. (see comments #2 and #3) [Other Info] * Patch applied upstream on v5.8-rc1 [1] * Applied on Unstable (tag Ubuntu-5.8-5.8.0-0.1) * Not required on Groovy (still 5.4; should sync from Unstable) * Not required on Eoan (EOL date before SRU cycle release date) * Required on Bionic and Focal. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=3b646abc5bc6c0df649daea4c2c976bd4d47e4c8 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1883962/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp