Maybe a bug somewhere else too, which combination kernel/grsec/pax was used?
On 05/09/2014 05:15 PM, Michael Orlitzky wrote: > Last week, the LMTP daemon on our mail server (HP DL360 G6) crashed. > People noticed that the mail stopped coming in, so I SSHed in to check > on it, and there were some weird traces in the dmesg. While trying to > investigate, I noticed some more badness: > > # emerge -1 openntpd > Calculating dependencies... done! > > >>> Verifying ebuild manifests > Killed > > At that point I'm thinking, "hardware problem, there goes the weekend." > Most of my tools are committing suicide so I surrender and reboot. The > thing comes up fine and has been working ever since. > > Today, another one of our web servers (HP DL360 G5?) does the same > thing. The nightly log report was empty, because there's no syslog > daemon running. This morning dmesg shows: > >> [Fri May 9 11:00:42 2014] PAX: refcount overflow detected in: >> syslog-ng:21823, uid/euid: 0/0 >> [Fri May 9 11:00:42 2014] CPU: 2 PID: 21823 Comm: syslog-ng Not tainted >> 3.11.7-hardened-r1 #1 >> [Fri May 9 11:00:42 2014] task: ffff8802cffca080 ti: ffff8802cffca488 >> task.ti: ffff8802cffca488 >> [Fri May 9 11:00:42 2014] RIP: 0010:[<ffffffff810e311e>] >> [<ffffffff810e311e>] 0xffffffff810e311e >> [Fri May 9 11:00:42 2014] RSP: 0018:ffff880416f21c78 EFLAGS: 00000a96 >> [Fri May 9 11:00:42 2014] RAX: ffff88041f0048a0 RBX: ffff88041a1edf00 RCX: >> 0000000040276333 >> [Fri May 9 11:00:42 2014] RDX: 0000000040276332 RSI: 0000000000000000 RDI: >> ffff88041d858720 >> [Fri May 9 11:00:42 2014] RBP: 0000000000000008 R08: 0000000000010bc0 R09: >> ffff88042fb10bc0 >> [Fri May 9 11:00:42 2014] R10: 8000000000000000 R11: ffffea000fec3040 R12: >> ffff88041f0048a0 >> [Fri May 9 11:00:42 2014] R13: ffff88026628ef00 R14: ffff88041d858720 R15: >> ffff88041a1edf10 >> [Fri May 9 11:00:42 2014] FS: 0000000000000000(0000) >> GS:ffff88042fb00000(0000) knlGS:0000000000000000 >> [Fri May 9 11:00:42 2014] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [Fri May 9 11:00:42 2014] CR2: 0000035fb5abf850 CR3: 000000000138a000 CR4: >> 00000000000006b0 >> [Fri May 9 11:00:42 2014] Stack: >> [Fri May 9 11:00:42 2014] 0000000000000000 ffffffff818dde60 >> ffff8804140ac100 ffff8802cffca570 >> [Fri May 9 11:00:42 2014] ffff8802cffca080 ffff880416eb4200 >> ffff8802cffca080 ffffffff81052750 >> [Fri May 9 11:00:42 2014] 0000000000000000 0000000000000001 >> ffff88038e6260d8 ffff8802cffca598 >> [Fri May 9 11:00:42 2014] Call Trace: >> [Fri May 9 11:00:42 2014] [<ffffffff81052750>] ? 0xffffffff81052750 >> [Fri May 9 11:00:42 2014] [<ffffffff81036e10>] ? 0xffffffff81036e10 >> [Fri May 9 11:00:42 2014] [<ffffffff810371e8>] ? 0xffffffff810371e8 >> [Fri May 9 11:00:42 2014] [<ffffffff810449cc>] ? 0xffffffff810449cc >> [Fri May 9 11:00:42 2014] [<ffffffff8100241f>] ? 0xffffffff8100241f >> [Fri May 9 11:00:42 2014] [<ffffffff81002a89>] ? 0xffffffff81002a89 >> [Fri May 9 11:00:42 2014] [<ffffffff8137c212>] ? 0xffffffff8137c212 >> [Fri May 9 11:00:42 2014] Code: e9 68 fd 01 00 0f 1f 84 00 00 00 00 00 48 >> 8b 43 18 48 8b 7b 10 48 8b 40 30 f0 ff 88 30 01 00 00 71 09 f0 ff 80 30 01 >> 00 00 cd 04 <0f> b7 00 89 c2 66 81 e2 00 b0 66 81 fa 00 20 0f 84 53 ff ff ff >> [Fri May 9 11:00:42 2014] PAX: refcount overflow detected in: >> syslog-ng:21823, uid/euid: 0/0 >> [Fri May 9 11:00:42 2014] CPU: 2 PID: 21823 Comm: syslog-ng Not tainted >> 3.11.7-hardened-r1 #1 >> [Fri May 9 11:00:42 2014] task: ffff8802cffca080 ti: ffff8802cffca488 >> task.ti: ffff8802cffca488 >> [Fri May 9 11:00:42 2014] RIP: 0010:[<ffffffff810e311e>] >> [<ffffffff810e311e>] 0xffffffff810e311e >> [Fri May 9 11:00:42 2014] RSP: 0018:ffff880416f21c78 EFLAGS: 00000a96 >> [Fri May 9 11:00:42 2014] RAX: ffff88041f0048a0 RBX: ffff88041a1edc00 RCX: >> 0000000040c384f8 >> [Fri May 9 11:00:42 2014] RDX: 0000000040c384f7 RSI: 0000000000000000 RDI: >> ffff88041d858720 >> [Fri May 9 11:00:42 2014] RBP: 0000000000000008 R08: 0000000000010b60 R09: >> ffff88042fb10b60 >> [Fri May 9 11:00:42 2014] R10: 8000000000000000 R11: ffffea000f26a840 R12: >> ffff88041f0048a0 >> [Fri May 9 11:00:42 2014] R13: ffff88026628e000 R14: ffff88041d858720 R15: >> ffff88041a1edc10 >> [Fri May 9 11:00:42 2014] FS: 0000000000000000(0000) >> GS:ffff88042fb00000(0000) knlGS:0000000000000000 >> [Fri May 9 11:00:42 2014] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [Fri May 9 11:00:42 2014] CR2: 0000035fb5abf850 CR3: 000000000138a000 CR4: >> 00000000000006b0 >> [Fri May 9 11:00:42 2014] Stack: >> [Fri May 9 11:00:42 2014] 0000000000000000 ffffffff818dde60 >> ffff88041a1ed400 ffff8802cffca570 >> [Fri May 9 11:00:42 2014] ffff8802cffca080 ffff880416eb4200 >> ffff8802cffca080 ffffffff81052750 >> [Fri May 9 11:00:42 2014] 0000000000000000 0000000000000001 >> ffff88038e6260d8 ffff8802cffca598 >> [Fri May 9 11:00:42 2014] Call Trace: >> [Fri May 9 11:00:42 2014] [<ffffffff81052750>] ? 0xffffffff81052750 >> [Fri May 9 11:00:42 2014] [<ffffffff81036e10>] ? 0xffffffff81036e10 >> [Fri May 9 11:00:42 2014] [<ffffffff810371e8>] ? 0xffffffff810371e8 >> [Fri May 9 11:00:42 2014] [<ffffffff810449cc>] ? 0xffffffff810449cc >> [Fri May 9 11:00:42 2014] [<ffffffff8100241f>] ? 0xffffffff8100241f >> [Fri May 9 11:00:42 2014] [<ffffffff81002a89>] ? 0xffffffff81002a89 >> [Fri May 9 11:00:42 2014] [<ffffffff8137c212>] ? 0xffffffff8137c212 >> [Fri May 9 11:00:42 2014] Code: e9 68 fd 01 00 0f 1f 84 00 00 00 00 00 48 >> 8b 43 18 48 8b 7b 10 48 8b 40 30 f0 ff 88 30 01 00 00 71 09 f0 ff 80 30 01 >> 00 00 cd 04 <0f> b7 00 89 c2 66 81 e2 00 b0 66 81 fa 00 20 0f 84 53 ff ff ff > > > And things are segfaulting randomly. These machines have been running > 3.11.7-hardened-r1 since 2014-01-03 without issue until now -- all of > our servers have. So the timing seems a little coincidental. > > If it's not hardware (two different machines...), does this look like a > kernel bug? Should I upgrade over the weekend and pray? >