Thanks Jan. /proc/sys/vm/min_free_kbytes was set to 32M, I set it to 256M with system having 64 GB RAM. Also my swappiness was set to 0, no problems in lab tests, but I wonder if we hit some limit on 24/7 OSD operation.
I will update after some days of running with these parameter. Best regards, Alex On Fri, Jul 3, 2015 at 6:27 AM, Jan Schermer <j...@schermer.cz> wrote: > What’s the value of /proc/sys/vm/min_free_kbytes on your system? Increase > it to 256M (better do it if there’s lots of free memory) and see if it > helps. > It can also be set too high, hard to find any formula how to set it > correctly... > > Jan > > > On 03 Jul 2015, at 10:16, Alex Gorbachev <a...@iss-integration.com> wrote: > > Hello, we are experiencing severe OSD timeouts, OSDs are not taken out and > we see the following in syslog on Ubuntu 14.04.2 with Firefly 0.80.9. > > Thank you for any advice. > > Alex > > > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.261899] BUG: unable to > handle kernel paging request at 000000190000001c > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.261923] IP: > [<ffffffff8118e476>] find_get_entries+0x66/0x160 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.261941] PGD 1035954067 PUD 0 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.261955] Oops: 0000 [#1] SMP > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.261969] Modules linked in: > xfs libcrc32c ipmi_ssif intel_rapl iosf_mbi x86_pkg_temp_thermal > intel_powerclamp co > retemp kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel > aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd sb_edac edac_core > lpc_ich joy > dev mei_me mei ioatdma wmi 8021q ipmi_si garp 8250_fintek mrp > ipmi_msghandler stp llc bonding mac_hid lp parport mlx4_en vxlan > ip6_udp_tunnel udp_tunnel hid_ > generic usbhid hid igb ahci mpt2sas mlx4_core i2c_algo_bit libahci dca > raid_class ptp scsi_transport_sas pps_core arcmsr > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.262182] CPU: 10 PID: 8711 > Comm: ceph-osd Not tainted 4.1.0-040100-generic #201506220235 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.262197] Hardware name: > Supermicro X9DRD-7LN4F(-JBOD)/X9DRD-EF/X9DRD-7LN4F, BIOS 3.0a 12/05/2013 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.262215] task: > ffff8800721f1420 ti: ffff880fbad54000 task.ti: ffff880fbad54000 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.262229] RIP: > 0010:[<ffffffff8118e476>] [<ffffffff8118e476>] find_get_entries+0x66/0x160 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.262248] RSP: > 0018:ffff880fbad571a8 EFLAGS: 00010246 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.262258] RAX: > ffff880004000158 RBX: 000000000000000e RCX: 0000000000000000 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.262303] RDX: > ffff880004000158 RSI: ffff880fbad571c0 RDI: 0000001900000000 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.262347] RBP: > ffff880fbad57208 R08: 00000000000000c0 R09: 00000000000000ff > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.262391] R10: > 0000000000000000 R11: 0000000000000220 R12: 00000000000000b6 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.262435] R13: > ffff880fbad57268 R14: 000000000000000a R15: ffff880fbad572d8 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.262479] FS: > 00007f98cb0e0700(0000) GS:ffff88103f480000(0000) knlGS:0000000000000000 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.262524] CS: 0010 DS: 0000 > ES: 0000 CR0: 0000000080050033 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.262551] CR2: > 000000190000001c CR3: 0000001034f0e000 CR4: 00000000000407e0 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.262596] Stack: > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.262618] ffff880fbad571f8 > ffff880cf6076b30 ffff880bdde05da8 00000000000000e6 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.262669] 0000000000000100 > ffff880cf6076b28 00000000000000b5 ffff880fbad57258 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.262721] ffff880fbad57258 > ffff880fbad572d8 ffffffffffffffff ffff880cf6076b28 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.262772] Call Trace: > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.262801] > [<ffffffff8119b482>] pagevec_lookup_entries+0x22/0x30 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.262831] > [<ffffffff8119bd84>] truncate_inode_pages_range+0xf4/0x700 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.262862] > [<ffffffff8119c415>] truncate_inode_pages+0x15/0x20 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.262891] > [<ffffffff8119c53f>] truncate_inode_pages_final+0x5f/0xa0 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.262949] > [<ffffffffc0431c2c>] xfs_fs_evict_inode+0x3c/0xe0 [xfs] > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.262981] > [<ffffffff81220558>] evict+0xb8/0x190 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263009] > [<ffffffff81220671>] dispose_list+0x41/0x50 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263037] > [<ffffffff8122176f>] prune_icache_sb+0x4f/0x60 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263067] > [<ffffffff81208ab5>] super_cache_scan+0x155/0x1a0 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263096] > [<ffffffff8119d26f>] do_shrink_slab+0x13f/0x2c0 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263126] > [<ffffffff811a22b0>] ? shrink_lruvec+0x330/0x370 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263157] > [<ffffffff811b4189>] ? isolate_migratepages_block+0x299/0x5c0 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263188] > [<ffffffff8119d558>] shrink_slab+0xd8/0x110 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263217] > [<ffffffff811a25bf>] shrink_zone+0x2cf/0x300 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263246] > [<ffffffff811b4d3d>] ? compact_zone+0x7d/0x4f0 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263275] > [<ffffffff811a2a64>] shrink_zones+0x104/0x2a0 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263304] > [<ffffffff811b53ad>] ? compact_zone_order+0x5d/0x70 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263336] > [<ffffffff810f1666>] ? ktime_get+0x46/0xb0 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263365] > [<ffffffff811a2cd7>] do_try_to_free_pages+0xd7/0x160 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263396] > [<ffffffff811a3017>] try_to_free_pages+0xb7/0x170 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263427] > [<ffffffff8119571a>] __alloc_pages_nodemask+0x5ba/0x9c0 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263460] > [<ffffffff811dc9bc>] alloc_pages_current+0x9c/0x110 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263492] > [<ffffffff811e4f2a>] allocate_slab+0x20a/0x2e0 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263522] > [<ffffffff811e5031>] new_slab+0x31/0x1f0 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263553] > [<ffffffff817f8dd9>] __slab_alloc+0x18e/0x2a3 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263584] > [<ffffffff816d7817>] ? __alloc_skb+0x87/0x2b0 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263614] > [<ffffffff816d77e7>] ? __alloc_skb+0x57/0x2b0 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263643] > [<ffffffff811e9b7b>] __kmalloc_node_track_caller+0xbb/0x2b0 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263675] > [<ffffffff816d7817>] ? __alloc_skb+0x87/0x2b0 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263704] > [<ffffffff816d737c>] __kmalloc_reserve.isra.57+0x3c/0xa0 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263734] > [<ffffffff816d7817>] __alloc_skb+0x87/0x2b0 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263766] > [<ffffffff81737de1>] sk_stream_alloc_skb+0x41/0x130 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263796] > [<ffffffff817388b3>] tcp_sendmsg+0x2d3/0xa90 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263827] > [<ffffffff81764477>] inet_sendmsg+0x67/0xa0 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263858] > [<ffffffff816cea54>] ? copy_msghdr_from_user+0x154/0x1b0 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263891] > [<ffffffff816cdcfd>] sock_sendmsg+0x4d/0x60 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263920] > [<ffffffff816cef93>] ___sys_sendmsg+0x2b3/0x2c0 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263950] > [<ffffffff810a853c>] ? ttwu_do_wakeup+0x2c/0x100 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263979] > [<ffffffff810a8826>] ? ttwu_do_activate.constprop.121+0x66/0x70 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.264011] > [<ffffffff810abef5>] ? try_to_wake_up+0x215/0x2a0 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.264040] > [<ffffffff810abfb0>] ? wake_up_state+0x10/0x20 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.264071] > [<ffffffff810fce86>] ? wake_futex+0x76/0xb0 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.264099] > [<ffffffff810fe192>] ? futex_wake+0x72/0x140 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.264127] > [<ffffffff81222675>] ? __fget_light+0x25/0x70 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.264155] > [<ffffffff816cf9b9>] __sys_sendmsg+0x49/0x90 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.264184] > [<ffffffff816cfa19>] SyS_sendmsg+0x19/0x20 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.264215] > [<ffffffff8180d272>] system_call_fastpath+0x16/0x75 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.264243] Code: 00 4c 89 65 c0 > 31 d2 e9 86 00 00 00 66 0f 1f 84 00 00 00 00 00 48 8b 3a 48 85 ff 0f 84 ad > 00 00 0 > 0 40 f6 c7 03 0f 85 a9 00 00 00 <8b> 4f 1c 85 c9 74 e3 8d 71 01 4c 8d 47 > 1c 89 c8 f0 0f b1 77 1c > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.264467] RIP > [<ffffffff8118e476>] find_get_entries+0x66/0x160 > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.264499] RSP > <ffff880fbad571a8> > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.264522] CR2: 000000190000001c > Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.264824] ---[ end trace > ae271fe24c8d817e ]--- > Jul 3 03:45:01 roc-4r-sca020 CRON[801140]: (root) CMD (command -v > debian-sa1 > /dev/null && debian-sa1 1 1) > Jul 2 06:28:21 roc-4r-sca020 rsyslogd: message repeated 6 times: [ > [origin software="rsyslogd" swVersion="7.4.4" x-pid="722" x-info=" > http://www.rsyslog.com" > ] rsyslogd was HUPed] > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com