Thanks Jan.  /proc/sys/vm/min_free_kbytes was set to 32M, I set it to 256M
with system having 64 GB RAM.  Also my swappiness was set to 0, no problems
in lab tests, but I wonder if we hit some limit on 24/7 OSD operation.

I will update after some days of running with these parameter.  Best
regards, Alex

On Fri, Jul 3, 2015 at 6:27 AM, Jan Schermer <j...@schermer.cz> wrote:

> What’s the value of /proc/sys/vm/min_free_kbytes on your system? Increase
> it to 256M (better do it if there’s lots of free memory) and see if it
> helps.
> It can also be set too high, hard to find any formula how to set it
> correctly...
>
> Jan
>
>
> On 03 Jul 2015, at 10:16, Alex Gorbachev <a...@iss-integration.com> wrote:
>
> Hello, we are experiencing severe OSD timeouts, OSDs are not taken out and
> we see the following in syslog on Ubuntu 14.04.2 with Firefly 0.80.9.
>
> Thank you for any advice.
>
> Alex
>
>
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.261899] BUG: unable to
> handle kernel paging request at 000000190000001c
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.261923] IP:
> [<ffffffff8118e476>] find_get_entries+0x66/0x160
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.261941] PGD 1035954067 PUD 0
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.261955] Oops: 0000 [#1] SMP
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.261969] Modules linked in:
> xfs libcrc32c ipmi_ssif intel_rapl iosf_mbi x86_pkg_temp_thermal
> intel_powerclamp co
> retemp kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel
> aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd sb_edac edac_core
> lpc_ich joy
> dev mei_me mei ioatdma wmi 8021q ipmi_si garp 8250_fintek mrp
> ipmi_msghandler stp llc bonding mac_hid lp parport mlx4_en vxlan
> ip6_udp_tunnel udp_tunnel hid_
> generic usbhid hid igb ahci mpt2sas mlx4_core i2c_algo_bit libahci dca
> raid_class ptp scsi_transport_sas pps_core arcmsr
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262182] CPU: 10 PID: 8711
> Comm: ceph-osd Not tainted 4.1.0-040100-generic #201506220235
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262197] Hardware name:
> Supermicro X9DRD-7LN4F(-JBOD)/X9DRD-EF/X9DRD-7LN4F, BIOS 3.0a 12/05/2013
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262215] task:
> ffff8800721f1420 ti: ffff880fbad54000 task.ti: ffff880fbad54000
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262229] RIP:
> 0010:[<ffffffff8118e476>]  [<ffffffff8118e476>] find_get_entries+0x66/0x160
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262248] RSP:
> 0018:ffff880fbad571a8  EFLAGS: 00010246
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262258] RAX:
> ffff880004000158 RBX: 000000000000000e RCX: 0000000000000000
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262303] RDX:
> ffff880004000158 RSI: ffff880fbad571c0 RDI: 0000001900000000
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262347] RBP:
> ffff880fbad57208 R08: 00000000000000c0 R09: 00000000000000ff
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262391] R10:
> 0000000000000000 R11: 0000000000000220 R12: 00000000000000b6
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262435] R13:
> ffff880fbad57268 R14: 000000000000000a R15: ffff880fbad572d8
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262479] FS:
>  00007f98cb0e0700(0000) GS:ffff88103f480000(0000) knlGS:0000000000000000
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262524] CS:  0010 DS: 0000
> ES: 0000 CR0: 0000000080050033
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262551] CR2:
> 000000190000001c CR3: 0000001034f0e000 CR4: 00000000000407e0
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262596] Stack:
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262618]  ffff880fbad571f8
> ffff880cf6076b30 ffff880bdde05da8 00000000000000e6
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262669]  0000000000000100
> ffff880cf6076b28 00000000000000b5 ffff880fbad57258
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262721]  ffff880fbad57258
> ffff880fbad572d8 ffffffffffffffff ffff880cf6076b28
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262772] Call Trace:
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262801]
>  [<ffffffff8119b482>] pagevec_lookup_entries+0x22/0x30
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262831]
>  [<ffffffff8119bd84>] truncate_inode_pages_range+0xf4/0x700
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262862]
>  [<ffffffff8119c415>] truncate_inode_pages+0x15/0x20
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262891]
>  [<ffffffff8119c53f>] truncate_inode_pages_final+0x5f/0xa0
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262949]
>  [<ffffffffc0431c2c>] xfs_fs_evict_inode+0x3c/0xe0 [xfs]
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262981]
>  [<ffffffff81220558>] evict+0xb8/0x190
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263009]
>  [<ffffffff81220671>] dispose_list+0x41/0x50
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263037]
>  [<ffffffff8122176f>] prune_icache_sb+0x4f/0x60
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263067]
>  [<ffffffff81208ab5>] super_cache_scan+0x155/0x1a0
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263096]
>  [<ffffffff8119d26f>] do_shrink_slab+0x13f/0x2c0
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263126]
>  [<ffffffff811a22b0>] ? shrink_lruvec+0x330/0x370
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263157]
>  [<ffffffff811b4189>] ? isolate_migratepages_block+0x299/0x5c0
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263188]
>  [<ffffffff8119d558>] shrink_slab+0xd8/0x110
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263217]
>  [<ffffffff811a25bf>] shrink_zone+0x2cf/0x300
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263246]
>  [<ffffffff811b4d3d>] ? compact_zone+0x7d/0x4f0
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263275]
>  [<ffffffff811a2a64>] shrink_zones+0x104/0x2a0
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263304]
>  [<ffffffff811b53ad>] ? compact_zone_order+0x5d/0x70
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263336]
>  [<ffffffff810f1666>] ? ktime_get+0x46/0xb0
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263365]
>  [<ffffffff811a2cd7>] do_try_to_free_pages+0xd7/0x160
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263396]
>  [<ffffffff811a3017>] try_to_free_pages+0xb7/0x170
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263427]
>  [<ffffffff8119571a>] __alloc_pages_nodemask+0x5ba/0x9c0
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263460]
>  [<ffffffff811dc9bc>] alloc_pages_current+0x9c/0x110
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263492]
>  [<ffffffff811e4f2a>] allocate_slab+0x20a/0x2e0
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263522]
>  [<ffffffff811e5031>] new_slab+0x31/0x1f0
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263553]
>  [<ffffffff817f8dd9>] __slab_alloc+0x18e/0x2a3
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263584]
>  [<ffffffff816d7817>] ? __alloc_skb+0x87/0x2b0
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263614]
>  [<ffffffff816d77e7>] ? __alloc_skb+0x57/0x2b0
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263643]
>  [<ffffffff811e9b7b>] __kmalloc_node_track_caller+0xbb/0x2b0
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263675]
>  [<ffffffff816d7817>] ? __alloc_skb+0x87/0x2b0
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263704]
>  [<ffffffff816d737c>] __kmalloc_reserve.isra.57+0x3c/0xa0
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263734]
>  [<ffffffff816d7817>] __alloc_skb+0x87/0x2b0
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263766]
>  [<ffffffff81737de1>] sk_stream_alloc_skb+0x41/0x130
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263796]
>  [<ffffffff817388b3>] tcp_sendmsg+0x2d3/0xa90
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263827]
>  [<ffffffff81764477>] inet_sendmsg+0x67/0xa0
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263858]
>  [<ffffffff816cea54>] ? copy_msghdr_from_user+0x154/0x1b0
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263891]
>  [<ffffffff816cdcfd>] sock_sendmsg+0x4d/0x60
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263920]
>  [<ffffffff816cef93>] ___sys_sendmsg+0x2b3/0x2c0
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263950]
>  [<ffffffff810a853c>] ? ttwu_do_wakeup+0x2c/0x100
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263979]
>  [<ffffffff810a8826>] ? ttwu_do_activate.constprop.121+0x66/0x70
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264011]
>  [<ffffffff810abef5>] ? try_to_wake_up+0x215/0x2a0
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264040]
>  [<ffffffff810abfb0>] ? wake_up_state+0x10/0x20
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264071]
>  [<ffffffff810fce86>] ? wake_futex+0x76/0xb0
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264099]
>  [<ffffffff810fe192>] ? futex_wake+0x72/0x140
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264127]
>  [<ffffffff81222675>] ? __fget_light+0x25/0x70
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264155]
>  [<ffffffff816cf9b9>] __sys_sendmsg+0x49/0x90
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264184]
>  [<ffffffff816cfa19>] SyS_sendmsg+0x19/0x20
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264215]
>  [<ffffffff8180d272>] system_call_fastpath+0x16/0x75
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264243] Code: 00 4c 89 65 c0
> 31 d2 e9 86 00 00 00 66 0f 1f 84 00 00 00 00 00 48 8b 3a 48 85 ff 0f 84 ad
> 00 00 0
> 0 40 f6 c7 03 0f 85 a9 00 00 00 <8b> 4f 1c 85 c9 74 e3 8d 71 01 4c 8d 47
> 1c 89 c8 f0 0f b1 77 1c
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264467] RIP
>  [<ffffffff8118e476>] find_get_entries+0x66/0x160
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264499]  RSP
> <ffff880fbad571a8>
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264522] CR2: 000000190000001c
> Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264824] ---[ end trace
> ae271fe24c8d817e ]---
> Jul  3 03:45:01 roc-4r-sca020 CRON[801140]: (root) CMD (command -v
> debian-sa1 > /dev/null && debian-sa1 1 1)
> Jul  2 06:28:21 roc-4r-sca020 rsyslogd: message repeated 6 times: [
> [origin software="rsyslogd" swVersion="7.4.4" x-pid="722" x-info="
> http://www.rsyslog.com";
> ] rsyslogd was HUPed]
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to