On Wed, Mar 26, 2014 at 8:45 AM, Raphael Bauduin <rbli...@gmail.com> wrote:
> Hi, > > we have regular crashed of a kvm host with the error "unable to handle > paging request". > Can this be due to memory over-commitment even if some memory is still > used by the kernel for caches and buffers? (collectd graph shows no free > memory, with 15G used, very little buffers, and 1G cache). There are 32GB > of swap, of which only 150MB are used. > > I suspect might be the direction to search to find the cause, but would be > happy to learn from people versed in the kernel behaviour to confirm or > reject my hypothesis. Below is the full error. > > Thanks! > > Raph > > > > 745 Mar 23 14:27:37 sMaster01 kernel: [241450.355339] BUG: unable to > handle kernel paging request at ffff8804c001fade > 746 Mar 23 14:27:37 sMaster01 kernel: [241450.355384] IP: > [<ffffffff8117e9e9>] bio_check_eod+0x29/0xcd > 747 Mar 23 14:27:37 sMaster01 kernel: [241450.355433] PGD 1002063 PUD 0 > 748 Mar 23 14:27:37 sMaster01 kernel: [241450.355464] Oops: 0000 [#1] SMP > 749 Mar 23 14:27:37 sMaster01 kernel: [241450.355496] last sysfs file: > /sys/devices/system/cpu/cpu15/ > topology/thread_siblings > 750 Mar 23 14:27:37 sMaster01 kernel: [241450.355551] CPU 4 > 751 Mar 23 14:27:37 sMaster01 kernel: [241450.355577] Modules linked in: > ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 > xt_conntrack nf_conntrack ipt_REJECT xt_tcpudp kvm_amd kvm ip6table_filter > ip6_tables iptable_fi lter ip_tables x_tables tun nfsd exportfs nfs > lockd fscache nfs_acl auth_rpcgss sunrpc bridge stp bonding dm_round_robin > dm_multipath scsi_dh loop snd_pcm snd_timer snd soundcore snd_page_alloc > serio_raw evdev tpm_tis tpm tpm_bios p smouse pcspkr amd64_edac_mod > edac_core button edac_mce_amd shpchp i2c_piix4 container pci_hotplug > i2c_core processor ext3 jbd mbcache dm_mirror dm_region_hash dm_log > dm_snapshot dm_mod sd_mod crc_t10dif mptsas mptscsih mptbase lpfc > ehci_hcd scsi_transport_fc tg3 scsi_tgt scsi_transport_sas ohci_hcd libphy > scsi_mod usbcore nls_base thermal fan thermal_sys [last unloaded: > scsi_wait_scan] > 752 Mar 23 14:27:37 sMaster01 kernel: [241450.356084] Pid: 3557, comm: > kjournald Not tainted 2.6.32.61vanilla #1 PRIMERGY BX630 S2 > 753 Mar 23 14:27:37 sMaster01 kernel: [241450.356141] RIP: > 0010:[<ffffffff8117e9e9>] [<ffffffff8117e9e9>] bio_check_eod+0x29/0xcd > 754 Mar 23 14:27:37 sMaster01 kernel: [241450.356196] RSP: > 0018:ffff8804229abba0 EFLAGS: 00010202 > 755 Mar 23 14:27:37 sMaster01 kernel: [241450.356228] RAX: > ffff8804c001fad6 RBX: ffff8802e7235080 RCX: 00011200061e5110 > 756 Mar 23 14:27:37 sMaster01 kernel: [241450.356279] RDX: > 0000000000000008 RSI: 0000000000000008 RDI: ffff8802e7235080 > 757 Mar 23 14:27:37 sMaster01 kernel: [241450.356331] RBP: > ffff8802e7235080 R08: 0000000000000000 R09: ffff880425c54c00 > 758 Mar 23 14:27:37 sMaster01 kernel: [241450.356383] R10: > 0000000000000003 R11: 00000000022e539e R12: ffff8802e7235080 > 759 Mar 23 14:27:37 sMaster01 kernel: [241450.356434] R13: > ffff8802e7235080 R14: ffff880425c54c00 R15: ffff8802e6281850 > 760 Mar 23 14:27:37 sMaster01 kernel: [241450.356486] FS: > 00007faa6a757820(0000) GS:ffff88000fc80000(0000) knlGS:0000000000000000 > 761 Mar 23 14:27:37 sMaster01 kernel: [241450.356540] CS: 0010 DS: 0018 > ES: 0018 CR0: 000000008005003b > 762 Mar 23 14:27:37 sMaster01 kernel: [241450.356573] CR2: > ffff8804c001fade CR3: 00000000cc11f000 CR4: 00000000000006e0 > 763 Mar 23 14:27:37 sMaster01 kernel: [241450.356628] DR0: > 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > 764 Mar 23 14:27:37 sMaster01 kernel: [241450.356681] DR3: > 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > 765 Mar 23 14:27:37 sMaster01 kernel: [241450.356733] Process kjournald > (pid: 3557, threadinfo ffff8804229aa000, task ffff88041490a300) > 766 Mar 23 14:27:37 sMaster01 kernel: [241450.356788] Stack: > 767 Mar 23 14:27:37 sMaster01 kernel: [241450.356812] ffff880415382c00 > 0000000100000285 ffff8804229abfd8 0000000000005186 > 768 Mar 23 14:27:37 sMaster01 kernel: [241450.356852] <0> > 0000000000000000 000000000f1c2776 ffff8804128efa38 ffff8802e7235080 > 769 Mar 23 14:27:37 sMaster01 kernel: [241450.356913] <0> > ffff8802e7235080 ffff8802e7235080 ffff8800cdacae40 ffffffff8117eb5a > 770 Mar 23 14:27:37 sMaster01 kernel: [241450.356993] Call Trace: > 771 Mar 23 14:27:37 sMaster01 kernel: [241450.357021] > [<ffffffff8117eb5a>] ? generic_make_request+0xcd/0x2f9 > 772 Mar 23 14:27:37 sMaster01 kernel: [241450.357058] > [<ffffffff810b6034>] ? mempool_alloc+0x55/0x106 > 773 Mar 23 14:27:37 sMaster01 kernel: [241450.357091] > [<ffffffff8117ee5c>] ? submit_bio+0xd6/0xf2 > 774 Mar 23 14:27:37 sMaster01 kernel: [241450.357125] > [<ffffffff8110d83f>] ? submit_bh+0xf5/0x115 > 775 Mar 23 14:27:37 sMaster01 kernel: [241450.357158] > [<ffffffff8110edc0>] ? sync_dirty_buffer+0x51/0x93 > 776 Mar 23 14:27:37 sMaster01 kernel: [241450.357196] > [<ffffffffa01727c7>] ? journal_commit_transaction+0xaa6/0xe4f [jbd] > 777 Mar 23 14:27:37 sMaster01 kernel: [241450.357252] > [<ffffffffa0175194>] ? kjournald+0xdf/0x226 [jbd] > 778 Mar 23 14:27:37 sMaster01 kernel: [241450.357288] > [<ffffffff810651de>] ? autoremove_wake_function+0x0/0x2e > 779 Mar 23 14:27:37 sMaster01 kernel: [241450.357324] > [<ffffffffa01750b5>] ? kjournald+0x0/0x226 [jbd] > 780 Mar 23 14:27:37 sMaster01 kernel: [241450.357357] > [<ffffffff81064f11>] ? kthread+0x79/0x81 > 781 Mar 23 14:27:37 sMaster01 kernel: [241450.357391] > [<ffffffff81011baa>] ? child_rip+0xa/0x20 > 782 Mar 23 14:27:37 sMaster01 kernel: [241450.357425] > [<ffffffff81016568>] ? read_tsc+0xa/0x20 > 783 Mar 23 14:27:37 sMaster01 kernel: [241450.357456] > [<ffffffff81064e98>] ? kthread+0x0/0x81 > 784 Mar 23 14:27:37 sMaster01 kernel: [241450.357487] > [<ffffffff81011ba0>] ? child_rip+0x0/0x20 > 785 Mar 23 14:27:37 sMaster01 kernel: [241450.357517] Code: 5c c3 41 55 > 49 89 fd 41 54 55 53 48 83 ec 38 65 48 8b 04 25 28 00 00 00 48 89 44 24 28 > 31 c0 85 f6 0f 84 86 00 00 00 48 8b 47 10 <48> 8b 40 08 48 8b 40 68 48 c1 > f8 09 74 74 89 f2 48 8b 0f 48 39 > 786 Mar 23 14:27:37 sMaster01 kernel: [241450.357738] RIP > [<ffffffff8117e9e9>] bio_check_eod+0x29/0xcd > 787 Mar 23 14:27:37 sMaster01 kernel: [241450.357772] RSP > <ffff8804229abba0> > 788 Mar 23 14:27:37 sMaster01 kernel: [241450.357799] CR2: > ffff8804c001fade > 789 Mar 23 14:27:37 sMaster01 kernel: [241450.358183] ---[ end trace > 608fcf1f5a482549 ]--- > > We had a guest crashing with the same error "unable to handle kernel paging request", but in the function __destroy_inode this time. Could faulty memory cause this problem on host and guest? Raph
_______________________________________________ libvirt-users mailing list libvirt-users@redhat.com https://www.redhat.com/mailman/listinfo/libvirt-users