Control: found -1 3.16.7-ckt11-1+deb8u5 Control: notfound -1 3.16.7-ckt11-1
Thanks for your report. On Wed, 2015-11-04 at 18:53 +0100, Jan Prunk wrote: > Package: src:linux > Version: 3.16.7-ckt11-1 >From your text and the screenshot I think this should really be +deb8u5. I've updated the bug metadata with the first lines. > Severity: important > > Dear Maintainer, > The following kernel panic error appears at random in Xen > virtualisation. As in it has appeared randomly from time to time (i.e. more than once) or you've had a single random instance? > Please look at the error in screenshot attachment. > It's a Debian 8, Kernel 3.16.7-ckt11-1+deb8u5, Xen 4.4.4-pre The screenshot shows a fault at 0xffffffff812b6dad == memcpy+0xd, called from ndisc_send_redirect+0x3bf. Unfortunately disassembling memcpy from what I think is the correct dbg package[0] results in: Dump of assembler code for function memcpy: 0xffffffff812b6da0 <+0>: mov %rdi,%rax 0xffffffff812b6da3 <+3>: cmp $0x20,%rdx 0xffffffff812b6da7 <+7>: jb 0xffffffff812b6e27 <memcpy+135> 0xffffffff812b6da9 <+9>: cmp %dil,%sil 0xffffffff812b6dac <+12>: jl 0xffffffff812b6de3 <memcpy+67> 0xffffffff812b6dae <+14>: sub $0x20,%rdx 0xffffffff812b6db2 <+18>: sub $0x20,%rdx i.e. the faulting %rip (0xffffffff812b6dad) is not on an instruction boundary (it would be in the middle of that jl instruction, which cannot happen). The call in ndisc_send_redirect disassembles sensibly and matches up ok. If I decode the faulting address as if it were on an instruction boundary then I get: (gdb) x/i 0xffffffff812b6dad 0xffffffff812b6dad <memcpy+13>: xor $0x20ea8348,%eax which isn't accessing RAM and therefore surely cannot fault. The version you have given is corroborated by the screenshot and I am pretty I have got the correct dbg package to match. I suppose you haven't rebuilt the kernel or anything like that? I don't like to put things down to "cosmic rays", but if this was a one off then I'm struggling to think of anything else to explain what appears to be a single bit error in %rip. At this point I would normally ask if you had run memtest86 etc on the machine (i.e. if the RAM is known to be solid), but this seems to be a register and not memory related. > It's a production machine so not much detailed further testing can be > provided in time. > The information below (bugreport) is executed from a different > machine, so the info provided below is not matching the original > machine where the error appears ! FYI it is possible to run reportbug on a machine but get it to write the report to a file for transfer and sending from another machine. Ian. [0] http://security.debian.org/debian-security/pool/updates/main/l/linux/linux-image-3.16.0-4-amd64-dbg_3.16.7-ckt11-1+deb8u5_amd64.deb => /usr/lib/debug/vmlinux-3.16.0-4-amd64