2015-01-04 21:20 GMT+01:00 Erhan Ekici <erhan.ek...@gmail.com>: > Hi Matej, > > Can you change the values(both on compute and instance node) and see if soft > lock-ups generated again? There are two approaches; one decreasing and the > other increasing the cache. > > vm.dirty_background_ratio = 5 > vm.dirty_ratio = 10 > > If I/O & Cache is not the source of the problem, we should focus on CPU > settings or kernel. > > For example, you can try to use a different kernel (before 3.18.1) and see > if it happens again? > > Thanks, > Erhan >
Hello Erhan, I have changed both values as you declared both on compute and instance node and will now monitor if the lock-ups will occur again. In the meantime: the kernel on the compute node is 3.11.0-19-generic and on the instance node there is: 3.18.1-031801-generic. I will report back with the results. Thanks, Matej > On Sun, Jan 4, 2015 at 8:39 PM, Matej Mailing <mail...@tam.si> wrote: >> >> 2015-01-04 17:10 GMT+01:00 Erhan Ekici <erhan.ek...@gmail.com>: >> > Hi Matej, >> > >> > I am not sure but most of the soft lock-up messages generated due to I/O >> > issues. From your log entries, I see mysqld issues as well as others: >> > >> > Jan 3 10:41:44 postar kernel: [303154.352793] NMI watchdog: BUG: soft >> > lockup - CPU#3 stuck for 45s! [mysqld:4601] >> > .... >> > Jan 3 10:44:00 postar kernel: [303290.052498] INFO: task mysqld:2290 >> > blocked for more than 120 seconds. >> > Jan 3 10:41:44 postar kernel: [303154.352958] CPU: 3 PID: 4601 Comm: >> > mysqld >> > Not tainted 3.18.1-031801-generic #201412170637 >> > .... >> > Jan 3 10:44:00 postar kernel: [303290.052499] Tainted: G L >> > 3.18.1-031801-generic #201412170637 >> > Jan 3 10:44:00 postar kernel: [303290.052499] "echo 0 > >> > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> > Jan 3 10:44:00 postar kernel: [303290.052502] mysqld D >> > 0000000000000002 0 2290 1 0x00000000 >> > >> > Can you post the output of the following command on both instance nodes >> > OS >> > and compute node OS and memory of Compute node and instances? >> > >> > [user@node] sudo sysctl -a | grep dirty >> > >> > vm.dirty_background_bytes = 0 >> > vm.dirty_background_ratio = 10 >> > vm.dirty_bytes = 0 >> > vm.dirty_expire_centisecs = 3000 >> > vm.dirty_ratio = 20 >> > vm.dirty_writeback_centisecs = 500 >> > >> > >> > Erhan, >> >> Hello Erhan, >> >> instance node output: >> >> error: permission denied on key 'net.ipv4.route.flush' >> error: permission denied on key 'net.ipv6.route.flush' >> error: permission denied on key 'vm.compact_memory' >> vm.dirty_background_bytes = 0 >> vm.dirty_background_ratio = 10 >> vm.dirty_bytes = 0 >> vm.dirty_expire_centisecs = 3000 >> vm.dirty_ratio = 20 >> vm.dirty_writeback_centisecs = 500 >> >> Mem: 8175996k total, 4105708k used, 4070288k free, 218552k buffers >> >> >> compute node output: >> vm.dirty_background_bytes = 0 >> vm.dirty_background_ratio = 10 >> vm.dirty_bytes = 0 >> vm.dirty_expire_centisecs = 3000 >> vm.dirty_ratio = 20 >> vm.dirty_writeback_centisecs = 500 >> >> KiB Mem: 16432492 total, 16254932 used, 177560 free, 9316 buffers >> >> Lock-ups happen with different processes, above-mentioned mysqld is >> just an example. >> >> Thanks, >> Matej >> >> >> > >> > On Sun, Jan 4, 2015 at 2:36 PM, Matej Mailing <mail...@tam.si> wrote: >> >> >> >> 2015-01-03 17:56 GMT+01:00 John Griffith <john.griff...@solidfire.com>: >> >> > On Sat, Jan 3, 2015 at 5:05 AM, Matej Mailing <mail...@tam.si> wrote: >> >> >> Hi, >> >> >> >> >> >> we are experiencing randomly-timed soft lockup messages in different >> >> >> instances (running CentOS, Ubuntu, etc.) with different processes >> >> >> and >> >> >> on different compute nodes. I suspect that access to the cinder >> >> >> storage via NFS could be the cause of issue, but then perhaps all >> >> >> the >> >> >> instances would trigger the error message on the same time? >> >> >> Currently >> >> >> they don't and no higher access loads are repoted on the NFS server >> >> >> itself, neither the load is higher at those times ... >> >> >> What are the best methods to debug this error? Any suggestions and >> >> >> expeiences on fixing it will be very welcome :-) >> >> >> >> >> >> Thanks, >> >> >> Matej >> >> >> >> >> >> _______________________________________________ >> >> >> Mailing list: >> >> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack >> >> >> Post to : openstack@lists.openstack.org >> >> >> Unsubscribe : >> >> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack >> >> > >> >> > Hi Matej, >> >> > >> >> > I'm assuming the kernel logs indicate NFS as the culprit here? What >> >> > sort of IO load are you seeing on the system when these are >> >> > encountered? >> >> > >> >> > A quick search on "soft lockup nfs" has a pretty lengthy list of >> >> > results including some setting recommendations etc. Maybe start >> >> > there? >> >> > >> >> > Thanks, >> >> > John >> >> >> >> Hello John, >> >> >> >> thank you very much for your answer. I can't find anything at all in >> >> the compute node's logs and my NFS-idea is just a guess >> >> actually.Example entry from Ubuntu instances is like: >> >> >> >> Jan 3 10:41:44 postar kernel: [303154.352793] NMI watchdog: BUG: soft >> >> lockup - CPU#3 stuck for 45s! [mysqld:4601] >> >> Jan 3 10:41:44 postar kernel: [303154.352890] Modules linked in: >> >> xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT >> >> nf_reject_ipv4 xt_limit xt_tcpudp xt_addrtype nf_conntrack_ipv4 >> >> nf_defrag_ipv4 xt_state ip6table_filter ip6_tables nfsd nfs_acl >> >> auth_rpcgss nf_conntrack_netbios_ns >> >> nf_conntrack_broadcast nfs nf_nat_ftp nf_nat ppdev nf_conntrack_ftp >> >> nf_conntrack fscache parport_pc parport lockd joydev pvpanic >> >> iptable_filter cirrus ip_tables 8250_fintek ttm psmouse drm_kms_helper >> >> sunrpc x_tables serio_raw hid_generic drm grace mac_hid sysimgblt >> >> sysfillrect syscopyarea >> >> i2c_piix4 usbhid hid floppy >> >> Jan 3 10:41:44 postar kernel: [303154.352958] CPU: 3 PID: 4601 Comm: >> >> mysqld Not tainted 3.18.1-031801-generic #201412170637 >> >> Jan 3 10:41:44 postar kernel: [303154.352961] Hardware name: >> >> OpenStack Foundation OpenStack Nova, BIOS Bochs 01/01/2011 >> >> Jan 3 10:41:44 postar kernel: [303154.352964] task: ffff880231b63c00 >> >> ti: ffff8801e6014000 task.ti: ffff8801e6014000 >> >> Jan 3 10:41:44 postar kernel: [303154.352966] RIP: >> >> 0010:[<ffffffff813a97dc>] [<ffffffff813a97dc>] >> >> copy_user_generic_string+0x2c/0x40 >> >> Jan 3 10:41:44 postar kernel: [303154.352979] RSP: >> >> 0018:ffff8801e6017c90 EFLAGS: 00010246 >> >> Jan 3 10:41:44 postar kernel: [303154.352981] RAX: 00000000a2418000 >> >> RBX: ffff8801e6017ea0 RCX: 0000000000000200 >> >> Jan 3 10:41:44 postar kernel: [303154.352983] RDX: 0000000000000000 >> >> RSI: ffff8800a2418000 RDI: 00007fd365b90000 >> >> Jan 3 10:41:44 postar kernel: [303154.352985] RBP: ffff8801e6017d08 >> >> R08: ffffea000289061c R09: ffff8801e6017ca0 >> >> Jan 3 10:41:44 postar kernel: [303154.352987] R10: ffff8800a5c1b258 >> >> R11: 0000000000000293 R12: 0000000000000000 >> >> Jan 3 10:41:44 postar kernel: [303154.352989] R13: ffff8801e6017ec8 >> >> R14: 0000000000001000 R15: 00007fd365b90000 >> >> Jan 3 10:41:44 postar kernel: [303154.352995] FS: >> >> 00007fd36c28e700(0000) GS:ffff88023fd80000(0000) >> >> knlGS:0000000000000000 >> >> Jan 3 10:41:44 postar kernel: [303154.352998] CS: 0010 DS: 0000 ES: >> >> 0000 CR0: 0000000080050033 >> >> Jan 3 10:41:44 postar kernel: [303154.353000] CR2: 0000000002b9e003 >> >> CR3: 00000002317ee000 CR4: 00000000000006e0 >> >> Jan 3 10:41:44 postar kernel: [303154.353014] Stack: >> >> Jan 3 10:41:44 postar kernel: [303154.353016] ffffffff811a0f75 >> >> ffff8801e6017d88 ffff8800a5c1b258 ffff8801e6017cc8 >> >> Jan 3 10:41:44 postar kernel: [303154.353019] ffffffff8117864e >> >> ffff8800a2418000 ffffea0002890600 0000000000001000 >> >> Jan 3 10:41:44 postar kernel: [303154.353023] ffff8800a2418000 >> >> 0000000000000000 ffffea0002890600 ffff880231e4b930 >> >> Jan 3 10:41:44 postar kernel: [303154.353026] Call Trace: >> >> Jan 3 10:41:44 postar kernel: [303154.353034] [<ffffffff811a0f75>] ? >> >> copy_page_to_iter_iovec+0xe5/0x300 >> >> Jan 3 10:41:44 postar kernel: [303154.353040] [<ffffffff8117864e>] ? >> >> find_get_entry+0x1e/0x90 >> >> Jan 3 10:41:44 postar kernel: [303154.353044] [<ffffffff811a14a6>] >> >> copy_page_to_iter+0x16/0x70 >> >> Jan 3 10:41:44 postar kernel: [303154.353047] [<ffffffff81179428>] >> >> do_generic_file_read+0x1f8/0x490 >> >> Jan 3 10:41:44 postar kernel: [303154.353051] [<ffffffff8117a234>] >> >> generic_file_read_iter+0xf4/0x150 >> >> Jan 3 10:41:44 postar kernel: [303154.353056] [<ffffffff810aade1>] ? >> >> update_curr+0x141/0x1f0 >> >> Jan 3 10:41:44 postar kernel: [303154.353060] [<ffffffff810efcd8>] ? >> >> get_futex_key+0x1f8/0x2e0 >> >> Jan 3 10:41:44 postar kernel: [303154.353065] [<ffffffff811eef28>] >> >> new_sync_read+0x78/0xb0 >> >> Jan 3 10:41:44 postar kernel: [303154.353068] [<ffffffff811f013b>] >> >> vfs_read+0xab/0x180 >> >> Jan 3 10:41:44 postar kernel: [303154.353071] [<ffffffff811f0402>] >> >> SyS_pread64+0x92/0xa0 >> >> Jan 3 10:41:44 postar kernel: [303154.353077] [<ffffffff817b376d>] >> >> system_call_fastpath+0x16/0x1b >> >> Jan 3 10:41:44 postar kernel: [303154.353079] Code: 66 90 83 fa 08 72 >> >> 27 89 f9 83 e1 07 74 15 83 e9 08 f7 d9 29 ca 8a 06 88 07 48 ff c6 48 >> >> ff c7 ff c9 75 f2 89 d1 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 31 c0 >> >> 66 66 90 c3 0f 1f 80 00 00 00 00 66 >> >> >> >> >> >> and: >> >> >> >> Jan 3 10:44:00 postar kernel: [303164.436029] INFO: rcu_sched >> >> detected stalls on CPUs/tasks: { 3} (detected by 0, t=33923 jiffies, >> >> g=1793081, c=1793080, q=0) >> >> Jan 3 10:44:00 postar kernel: [303164.436029] Task dump for CPU 3: >> >> Jan 3 10:44:00 postar kernel: [303164.436029] mysqld R >> >> running task 0 4601 1 0x00000008 >> >> Jan 3 10:44:00 postar kernel: [303164.436029] 0000000000000001 >> >> 00007fd36286c000 0000000000004000 0000000000004000 >> >> Jan 3 10:44:00 postar kernel: [303164.436029] 00007fd36286c000 >> >> ffff8801e6017f28 ffffffff811f013b 00007fd36c28ba80 >> >> Jan 3 10:44:00 postar kernel: [303164.436029] ffff8802316eaf01 >> >> ffff8802316eaf00 00007fd36286c000 0000000000004000 >> >> Jan 3 10:44:00 postar kernel: [303164.436029] Call Trace: >> >> Jan 3 10:44:00 postar kernel: [303164.436029] [<ffffffff811f013b>] ? >> >> vfs_read+0xab/0x180 >> >> Jan 3 10:44:00 postar kernel: [303164.436029] [<ffffffff811f0402>] ? >> >> SyS_pread64+0x92/0xa0 >> >> Jan 3 10:44:00 postar kernel: [303164.436029] [<ffffffff817b376d>] ? >> >> system_call_fastpath+0x16/0x1b >> >> Jan 3 10:44:00 postar kernel: [303290.052393] INFO: task >> >> jbd2/vda1-8:256 blocked for more than 120 seconds. >> >> Jan 3 10:44:00 postar kernel: [303290.052395] Tainted: G >> >> L 3.18.1-031801-generic #201412170637 >> >> Jan 3 10:44:00 postar kernel: [303290.052396] "echo 0 > >> >> /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> >> Jan 3 10:44:00 postar kernel: [303290.052399] jbd2/vda1-8 D >> >> 0000000000000002 0 256 2 0x00000000 >> >> Jan 3 10:44:00 postar kernel: [303290.052402] ffff88023170b6a8 >> >> 0000000000000046 ffff88023170b648 0000000000f2f754 >> >> Jan 3 10:44:00 postar kernel: [303290.052404] ffff88023170bfd8 >> >> 0000000000013640 ffff8800baef5f80 0000000000013640 >> >> Jan 3 10:44:00 postar kernel: [303290.052406] ffff8802342ab200 >> >> ffff880231b64600 ffff88023170b678 ffff88023fd13f40 >> >> Jan 3 10:44:00 postar kernel: [303290.052407] Call Trace: >> >> Jan 3 10:44:00 postar kernel: [303290.052413] [<ffffffff817af549>] >> >> schedule+0x29/0x70 >> >> Jan 3 10:44:00 postar kernel: [303290.052415] [<ffffffff817af61f>] >> >> io_schedule+0x8f/0xd0 >> >> Jan 3 10:44:00 postar kernel: [303290.052419] [<ffffffff81379f30>] >> >> bt_get+0xf0/0x180 >> >> Jan 3 10:44:00 postar kernel: [303290.052423] [<ffffffff810b4bd0>] ? >> >> prepare_to_wait_event+0x100/0x100 >> >> Jan 3 10:44:00 postar kernel: [303290.052425] [<ffffffff8137a2f2>] >> >> blk_mq_get_tag+0x82/0xd0 >> >> Jan 3 10:44:00 postar kernel: [303290.052428] [<ffffffff813762da>] >> >> __blk_mq_alloc_request+0x2a/0xb0 >> >> Jan 3 10:44:00 postar kernel: [303290.052430] [<ffffffff81377d72>] >> >> blk_mq_map_request+0x182/0x1f0 >> >> Jan 3 10:44:00 postar kernel: [303290.052432] [<ffffffff81377a36>] ? >> >> __blk_mq_run_hw_queue+0xf6/0x260 >> >> Jan 3 10:44:00 postar kernel: [303290.052434] [<ffffffff81378446>] >> >> blk_sq_make_request+0x96/0x310 >> >> Jan 3 10:44:00 postar kernel: [303290.052438] [<ffffffff81369c65>] >> >> generic_make_request.part.75+0x75/0xb0 >> >> Jan 3 10:44:00 postar kernel: [303290.052440] [<ffffffff8136a288>] >> >> generic_make_request+0x68/0x70 >> >> Jan 3 10:44:00 postar kernel: [303290.052442] [<ffffffff8136a316>] >> >> submit_bio+0x86/0x160 >> >> Jan 3 10:44:00 postar kernel: [303290.052446] [<ffffffff81177789>] ? >> >> unlock_page+0x69/0x70 >> >> Jan 3 10:44:00 postar kernel: [303290.052449] [<ffffffff81278b39>] >> >> ext4_io_submit+0x29/0x50 >> >> Jan 3 10:44:00 postar kernel: [303290.052451] [<ffffffff8127427b>] >> >> ext4_writepage+0xcb/0x240 >> >> Jan 3 10:44:00 postar kernel: [303290.052454] [<ffffffff811b16c0>] ? >> >> SyS_msync+0x230/0x230 >> >> Jan 3 10:44:00 postar kernel: [303290.052457] [<ffffffff81182b87>] >> >> __writepage+0x17/0x50 >> >> Jan 3 10:44:00 postar kernel: [303290.052460] [<ffffffff8118323a>] >> >> write_cache_pages+0x20a/0x470 >> >> Jan 3 10:44:00 postar kernel: [303290.052462] [<ffffffff81182b70>] ? >> >> set_page_dirty_lock+0x60/0x60 >> >> Jan 3 10:44:00 postar kernel: [303290.052464] [<ffffffff811834ea>] >> >> generic_writepages+0x4a/0x70 >> >> Jan 3 10:44:00 postar kernel: [303290.052468] [<ffffffff812bb73f>] >> >> journal_submit_data_buffers+0xcf/0x180 >> >> Jan 3 10:44:00 postar kernel: [303290.052470] [<ffffffff812bc1c4>] >> >> jbd2_journal_commit_transaction+0x394/0x1440 >> >> Jan 3 10:44:00 postar kernel: [303290.052474] [<ffffffff812c09b3>] >> >> kjournald2+0xb3/0x230 >> >> Jan 3 10:44:00 postar kernel: [303290.052476] [<ffffffff810b4bd0>] ? >> >> prepare_to_wait_event+0x100/0x100 >> >> Jan 3 10:44:00 postar kernel: [303290.052478] [<ffffffff812c0900>] ? >> >> commit_timeout+0x10/0x10 >> >> Jan 3 10:44:00 postar kernel: [303290.052481] [<ffffffff81093a09>] >> >> kthread+0xc9/0xe0 >> >> Jan 3 10:44:00 postar kernel: [303290.052483] [<ffffffff81093940>] ? >> >> flush_kthread_worker+0x90/0x90 >> >> Jan 3 10:44:00 postar kernel: [303290.052485] [<ffffffff817b36bc>] >> >> ret_from_fork+0x7c/0xb0 >> >> Jan 3 10:44:00 postar kernel: [303290.052487] [<ffffffff81093940>] ? >> >> flush_kthread_worker+0x90/0x90 >> >> Jan 3 10:44:00 postar kernel: [303290.052498] INFO: task mysqld:2290 >> >> blocked for more than 120 seconds. >> >> Jan 3 10:44:00 postar kernel: [303290.052499] Tainted: G >> >> L 3.18.1-031801-generic #201412170637 >> >> Jan 3 10:44:00 postar kernel: [303290.052499] "echo 0 > >> >> /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> >> Jan 3 10:44:00 postar kernel: [303290.052502] mysqld D >> >> 0000000000000002 0 2290 1 0x00000000 >> >> Jan 3 10:44:00 postar kernel: [303290.052504] ffff88023122b9c8 >> >> 0000000000000082 ffff880200000002 0000000000f2f748 >> >> Jan 3 10:44:00 postar kernel: [303290.052506] ffff88023122bfd8 >> >> 0000000000013640 ffff880231b98380 0000000000013640 >> >> Jan 3 10:44:00 postar kernel: [303290.052508] ffff880037ab3200 >> >> ffff880233142800 ffff88023122b998 ffff88023fd13f40 >> >> Jan 3 10:44:00 postar kernel: [303290.052509] Call Trace: >> >> Jan 3 10:44:00 postar kernel: [303290.052511] [<ffffffff817af549>] >> >> schedule+0x29/0x70 >> >> Jan 3 10:44:00 postar kernel: [303290.052531] [<ffffffff817af61f>] >> >> io_schedule+0x8f/0xd0 >> >> Jan 3 10:44:00 postar kernel: [303290.052534] [<ffffffff81379f30>] >> >> bt_get+0xf0/0x180 >> >> Jan 3 10:44:00 postar kernel: [303290.052536] [<ffffffff810b4bd0>] ? >> >> prepare_to_wait_event+0x100/0x100 >> >> Jan 3 10:44:00 postar kernel: [303290.052538] [<ffffffff8137a2f2>] >> >> blk_mq_get_tag+0x82/0xd0 >> >> Jan 3 10:44:00 postar kernel: [303290.052540] [<ffffffff813762da>] >> >> __blk_mq_alloc_request+0x2a/0xb0 >> >> Jan 3 10:44:00 postar kernel: [303290.052542] [<ffffffff81377d72>] >> >> blk_mq_map_request+0x182/0x1f0 >> >> Jan 3 10:44:00 postar kernel: [303290.052545] [<ffffffff81378446>] >> >> blk_sq_make_request+0x96/0x310 >> >> Jan 3 10:44:00 postar kernel: [303290.052547] [<ffffffff81369c65>] >> >> generic_make_request.part.75+0x75/0xb0 >> >> Jan 3 10:44:00 postar kernel: [303290.052549] [<ffffffff8136a288>] >> >> generic_make_request+0x68/0x70 >> >> Jan 3 10:44:00 postar kernel: [303290.052551] [<ffffffff8136a316>] >> >> submit_bio+0x86/0x160 >> >> Jan 3 10:44:00 postar kernel: [303290.052553] [<ffffffff812b940d>] ? >> >> jbd2__journal_start+0x4d/0x60 >> >> Jan 3 10:44:00 postar kernel: [303290.052555] [<ffffffff81278b39>] >> >> ext4_io_submit+0x29/0x50 >> >> Jan 3 10:44:00 postar kernel: [303290.052557] [<ffffffff81277446>] >> >> ext4_writepages+0x4f6/0x7b0 >> >> Jan 3 10:44:00 postar kernel: [303290.052560] [<ffffffff810efcd8>] ? >> >> get_futex_key+0x1f8/0x2e0 >> >> Jan 3 10:44:00 postar kernel: [303290.052563] [<ffffffff81184730>] >> >> do_writepages+0x20/0x40 >> >> Jan 3 10:44:00 postar kernel: [303290.052565] [<ffffffff81179b29>] >> >> __filemap_fdatawrite_range+0x59/0x60 >> >> Jan 3 10:44:00 postar kernel: [303290.052567] [<ffffffff81179b67>] >> >> filemap_write_and_wait_range+0x37/0x80 >> >> Jan 3 10:44:00 postar kernel: [303290.052571] [<ffffffff8126c3a5>] >> >> ext4_sync_file+0x105/0x290 >> >> Jan 3 10:44:00 postar kernel: [303290.052573] [<ffffffff811f0036>] ? >> >> vfs_write+0x196/0x1f0 >> >> Jan 3 10:44:00 postar kernel: [303290.052576] [<ffffffff8121faa8>] >> >> do_fsync+0x58/0x80 >> >> Jan 3 10:44:00 postar kernel: [303290.052578] [<ffffffff8121fe70>] >> >> SyS_fsync+0x10/0x20 >> >> Jan 3 10:44:00 postar kernel: [303290.052580] [<ffffffff817b376d>] >> >> system_call_fastpath+0x16/0x1b >> >> Jan 3 10:44:00 postar kernel: [303290.052590] INFO: task >> >> kworker/u8:1:22405 blocked for more than 120 seconds. >> >> Jan 3 10:44:00 postar kernel: [303290.052592] Tainted: G >> >> L 3.18.1-031801-generic #201412170637 >> >> Jan 3 10:44:00 postar kernel: [303290.052592] "echo 0 > >> >> /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> >> Jan 3 10:44:00 postar kernel: [303290.052595] kworker/u8:1 D >> >> 0000000000000002 0 22405 2 0x00000000 >> >> Jan 3 10:44:00 postar kernel: [303290.052600] Workqueue: writeback >> >> bdi_writeback_workfn (flush-253:0) >> >> Jan 3 10:44:00 postar kernel: [303290.052602] ffff8801f38374c8 >> >> 0000000000000046 ffff8801f38374b8 0000000000f2f754 >> >> Jan 3 10:44:00 postar kernel: [303290.052604] ffff8801f3837fd8 >> >> 0000000000013640 ffff8800baef5f80 0000000000013640 >> >> Jan 3 10:44:00 postar kernel: [303290.052606] ffff880231b64600 >> >> ffff8801f397d000 ffff8801f3837498 ffff88023fd13f40 >> >> Jan 3 10:44:00 postar kernel: [303290.052607] Call Trace: >> >> Jan 3 10:44:00 postar kernel: [303290.052609] [<ffffffff817af549>] >> >> schedule+0x29/0x70 >> >> Jan 3 10:44:00 postar kernel: [303290.052611] [<ffffffff817af61f>] >> >> io_schedule+0x8f/0xd0 >> >> Jan 3 10:44:00 postar kernel: [303290.052614] [<ffffffff81379f30>] >> >> bt_get+0xf0/0x180 >> >> Jan 3 10:44:00 postar kernel: [303290.052616] [<ffffffff810b4bd0>] ? >> >> prepare_to_wait_event+0x100/0x100 >> >> Jan 3 10:44:00 postar kernel: [303290.052618] [<ffffffff8137a2f2>] >> >> blk_mq_get_tag+0x82/0xd0 >> >> Jan 3 10:44:00 postar kernel: [303290.052620] [<ffffffff813762da>] >> >> __blk_mq_alloc_request+0x2a/0xb0 >> >> Jan 3 10:44:00 postar kernel: [303290.052622] [<ffffffff81377d72>] >> >> blk_mq_map_request+0x182/0x1f0 >> >> Jan 3 10:44:00 postar kernel: [303290.052626] [<ffffffff814692de>] ? >> >> alloc_indirect+0x1e/0x50 >> >> Jan 3 10:44:00 postar kernel: [303290.052628] [<ffffffff81378446>] >> >> blk_sq_make_request+0x96/0x310 >> >> Jan 3 10:44:00 postar kernel: [303290.052630] [<ffffffff81369c65>] >> >> generic_make_request.part.75+0x75/0xb0 >> >> Jan 3 10:44:00 postar kernel: [303290.052633] [<ffffffff8136a288>] >> >> generic_make_request+0x68/0x70 >> >> Jan 3 10:44:00 postar kernel: [303290.052634] [<ffffffff8136a316>] >> >> submit_bio+0x86/0x160 >> >> Jan 3 10:44:00 postar kernel: [303290.052638] [<ffffffff81364e5b>] ? >> >> bio_alloc_bioset+0xbb/0x220 >> >> Jan 3 10:44:00 postar kernel: [303290.052641] [<ffffffff81223ff0>] >> >> _submit_bh+0x110/0x150 >> >> Jan 3 10:44:00 postar kernel: [303290.052643] [<ffffffff81224040>] >> >> submit_bh+0x10/0x20 >> >> Jan 3 10:44:00 postar kernel: [303290.052646] [<ffffffff81225cc0>] >> >> __block_write_full_page.constprop.35+0x1a0/0x380 >> >> Jan 3 10:44:00 postar kernel: [303290.052648] [<ffffffff81226110>] ? >> >> I_BDEV+0x10/0x10 >> >> Jan 3 10:44:00 postar kernel: [303290.052650] [<ffffffff81226110>] ? >> >> I_BDEV+0x10/0x10 >> >> Jan 3 10:44:00 postar kernel: [303290.052652] [<ffffffff81226039>] >> >> block_write_full_page+0xa9/0xe0 >> >> Jan 3 10:44:00 postar kernel: [303290.052654] [<ffffffff81226b38>] >> >> blkdev_writepage+0x18/0x20 >> >> Jan 3 10:44:00 postar kernel: [303290.052656] [<ffffffff81182b87>] >> >> __writepage+0x17/0x50 >> >> Jan 3 10:44:00 postar kernel: [303290.052658] [<ffffffff8118323a>] >> >> write_cache_pages+0x20a/0x470 >> >> Jan 3 10:44:00 postar kernel: [303290.052660] [<ffffffff81182b70>] ? >> >> set_page_dirty_lock+0x60/0x60 >> >> Jan 3 10:44:00 postar kernel: [303290.052663] [<ffffffff811834ea>] >> >> generic_writepages+0x4a/0x70 >> >> Jan 3 10:44:00 postar kernel: [303290.052665] [<ffffffff81184730>] >> >> do_writepages+0x20/0x40 >> >> Jan 3 10:44:00 postar kernel: [303290.052668] [<ffffffff81217fe5>] >> >> __writeback_single_inode+0x45/0x1c0 >> >> Jan 3 10:44:00 postar kernel: [303290.052670] [<ffffffff812199be>] >> >> writeback_sb_inodes+0x22e/0x340 >> >> Jan 3 10:44:00 postar kernel: [303290.052673] [<ffffffff81219b6e>] >> >> __writeback_inodes_wb+0x9e/0xd0 >> >> Jan 3 10:44:00 postar kernel: [303290.052675] [<ffffffff81219e2b>] >> >> wb_writeback+0x28b/0x330 >> >> Jan 3 10:44:00 postar kernel: [303290.052678] [<ffffffff8120ac32>] ? >> >> get_nr_dirty_inodes+0x52/0x80 >> >> Jan 3 10:44:00 postar kernel: [303290.052681] [<ffffffff81219f6f>] >> >> wb_check_old_data_flush+0x9f/0xb0 >> >> Jan 3 10:44:00 postar kernel: [303290.052683] [<ffffffff8121a0b4>] >> >> wb_do_writeback+0x134/0x1c0 >> >> Jan 3 10:44:00 postar kernel: [303290.052685] [<ffffffff810908bf>] ? >> >> set_worker_desc+0x6f/0x80 >> >> Jan 3 10:44:00 postar kernel: [303290.052688] [<ffffffff8121c048>] >> >> bdi_writeback_workfn+0x78/0x1f0 >> >> Jan 3 10:44:00 postar kernel: [303290.052729] [<ffffffff8108d50e>] >> >> process_one_work+0x14e/0x460 >> >> Jan 3 10:44:00 postar kernel: [303290.052732] [<ffffffff8108de8b>] >> >> worker_thread+0x11b/0x3f0 >> >> Jan 3 10:44:00 postar kernel: [303290.052735] [<ffffffff8108dd70>] ? >> >> create_worker+0x1e0/0x1e0 >> >> Jan 3 10:44:00 postar kernel: [303290.052737] [<ffffffff81093a09>] >> >> kthread+0xc9/0xe0 >> >> Jan 3 10:44:00 postar kernel: [303290.052739] [<ffffffff81093940>] ? >> >> flush_kthread_worker+0x90/0x90 >> >> Jan 3 10:44:00 postar kernel: [303290.052741] [<ffffffff817b36bc>] >> >> ret_from_fork+0x7c/0xb0 >> >> Jan 3 10:44:00 postar kernel: [303290.052743] [<ffffffff81093940>] ? >> >> flush_kthread_worker+0x90/0x90 >> >> >> >> The instance's qcow2 file is on a NFS share, mounted on a compute node. >> >> >> >> Different processes are triggering the soft lock-up, what is >> >> interesting is that we see soft lock-ups showing up on the same time >> >> on all the instances running on the same compute node. I will examine >> >> iostat and top outputs closely when it happens, it's relatively hard >> >> to catch that moment since it happens for a relatively short period of >> >> time. >> >> >> >> Thank you for any ideas in advance. >> >> Matej >> >> >> >> _______________________________________________ >> >> Mailing list: >> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack >> >> Post to : openstack@lists.openstack.org >> >> Unsubscribe : >> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack >> > >> > > > _______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack