> On 8 Mar 2019, at 14.30, Mark Nelson <mnel...@redhat.com> wrote: > > > On 3/8/19 5:56 AM, Steffen Winther Sørensen wrote: >> >>> On 5 Mar 2019, at 10.02, Paul Emmerich <paul.emmer...@croit.io >>> <mailto:paul.emmer...@croit.io>> wrote: >>> >>> Yeah, there's a bug in 13.2.4. You need to set it to at least ~1.2GB. >> Yeap thanks, setting it at 1G+256M worked :) >> Hope this won’t bloat memory during coming weekend VM backups through CephFS >> > > > FWIW, setting it to 1.2G will almost certainly result in the bluestore caches > being stuck at cache_min, ie 128MB and the autotuner may not be able to keep > the OSD memory that low. I typically recommend a bare minimum of 2GB per > OSD, and on SSD/NVMe backed OSDs 3-4+ can improve performance significantly. This a smaller dev cluster, not much IO, 4 nodes of 16GB & 6x HDD OSD
Just want to avoid consuming swap, which bloated after patching to 13.2.4 from 13.2.2 after performing VM snapshots to CephFS, Otherwise cluster has been fine for ages… /Steffen > > > Mark > > > >>> On Tue, Mar 5, 2019 at 9:00 AM Steffen Winther Sørensen >>> <ste...@gmail.com> wrote: >>>> >>>> >>>> On 4 Mar 2019, at 16.09, Paul Emmerich <paul.emmer...@croit.io> wrote: >>>> >>>> Bloated to ~4 GB per OSD and you are on HDDs? >>>> >>>> Something like that yes. >>>> >>>> >>>> 13.2.3 backported the cache auto-tuning which targets 4 GB memory >>>> usage by default. >>>> >>>> >>>> See https://ceph.com/releases/13-2-4-mimic-released/ >>>> >>>> Right, thanks… >>>> >>>> >>>> The bluestore_cache_* options are no longer needed. They are replaced >>>> by osd_memory_target, defaulting to 4GB. BlueStore will expand >>>> and contract its cache to attempt to stay within this >>>> limit. Users upgrading should note this is a higher default >>>> than the previous bluestore_cache_size of 1GB, so OSDs using >>>> BlueStore will use more memory by default. >>>> For more details, see the BlueStore docs. >>>> >>>> Adding a 'osd memory target’ value to our ceph.conf and restarting an OSD >>>> just makes the OSD dump like this: >>>> >>>> [osd] >>>> ; this key makes 13.2.4 OSDs abort??? >>>> osd memory target = 1073741824 >>>> >>>> ; other OSD key settings >>>> osd pool default size = 2 # Write an object 2 times. >>>> osd pool default min size = 1 # Allow writing one copy in a degraded >>>> state. >>>> >>>> osd pool default pg num = 256 >>>> osd pool default pgp num = 256 >>>> >>>> client cache size = 131072 >>>> osd client op priority = 40 >>>> osd op threads = 8 >>>> osd client message size cap = 512 >>>> filestore min sync interval = 10 >>>> filestore max sync interval = 60 >>>> >>>> recovery max active = 2 >>>> recovery op priority = 30 >>>> osd max backfills = 2 >>>> >>>> >>>> >>>> >>>> osd log snippet: >>>> -472> 2019-03-05 08:36:02.233 7f2743a8c1c0 1 -- - start start >>>> -471> 2019-03-05 08:36:02.234 7f2743a8c1c0 2 osd.12 0 init >>>> /var/lib/ceph/osd/ceph-12 (looks like hdd) >>>> -470> 2019-03-05 08:36:02.234 7f2743a8c1c0 2 osd.12 0 journal >>>> /var/lib/ceph/osd/ceph-12/journal >>>> -469> 2019-03-05 08:36:02.234 7f2743a8c1c0 1 >>>> bluestore(/var/lib/ceph/osd/ceph-12) _mount path /var/lib/ceph/osd/ceph-12 >>>> -468> 2019-03-05 08:36:02.235 7f2743a8c1c0 1 bdev create path >>>> /var/lib/ceph/osd/ceph-12/block type kernel >>>> -467> 2019-03-05 08:36:02.235 7f2743a8c1c0 1 bdev(0x55b31af4a000 >>>> /var/lib/ceph/osd/ceph-12/block) open path /var/lib/ceph/osd/ceph-12/block >>>> -466> 2019-03-05 08:36:02.236 7f2743a8c1c0 1 bdev(0x55b31af4a000 >>>> /var/lib/ceph/osd/ceph-12/block) open size 146775474176 (0x222c800000, 137 >>>> GiB) block_size 4096 (4 KiB) rotational >>>> -465> 2019-03-05 08:36:02.236 7f2743a8c1c0 1 >>>> bluestore(/var/lib/ceph/osd/ceph-12) _set_cache_sizes cache_size >>>> 1073741824 meta 0.4 kv 0.4 data 0.2 >>>> -464> 2019-03-05 08:36:02.237 7f2743a8c1c0 1 bdev create path >>>> /var/lib/ceph/osd/ceph-12/block type kernel >>>> -463> 2019-03-05 08:36:02.237 7f2743a8c1c0 1 bdev(0x55b31af4aa80 >>>> /var/lib/ceph/osd/ceph-12/block) open path /var/lib/ceph/osd/ceph-12/block >>>> -462> 2019-03-05 08:36:02.238 7f2743a8c1c0 1 bdev(0x55b31af4aa80 >>>> /var/lib/ceph/osd/ceph-12/block) open size 146775474176 (0x222c800000, 137 >>>> GiB) block_size 4096 (4 KiB) rotational >>>> -461> 2019-03-05 08:36:02.238 7f2743a8c1c0 1 bluefs add_block_device >>>> bdev 1 path /var/lib/ceph/osd/ceph-12/block size 137 GiB >>>> -460> 2019-03-05 08:36:02.238 7f2743a8c1c0 1 bluefs mount >>>> -459> 2019-03-05 08:36:02.339 7f2743a8c1c0 0 set rocksdb option >>>> compaction_readahead_size = 2097152 >>>> -458> 2019-03-05 08:36:02.339 7f2743a8c1c0 0 set rocksdb option >>>> compression = kNoCompression >>>> -457> 2019-03-05 08:36:02.339 7f2743a8c1c0 0 set rocksdb option >>>> max_write_buffer_number = 4 >>>> -456> 2019-03-05 08:36:02.339 7f2743a8c1c0 0 set rocksdb option >>>> min_write_buffer_number_to_merge = 1 >>>> -455> 2019-03-05 08:36:02.339 7f2743a8c1c0 0 set rocksdb option >>>> recycle_log_file_num = 4 >>>> -454> 2019-03-05 08:36:02.339 7f2743a8c1c0 0 set rocksdb option >>>> writable_file_max_buffer_size = 0 >>>> -453> 2019-03-05 08:36:02.339 7f2743a8c1c0 0 set rocksdb option >>>> write_buffer_size = 268435456 >>>> -452> 2019-03-05 08:36:02.340 7f2743a8c1c0 0 set rocksdb option >>>> compaction_readahead_size = 2097152 >>>> -451> 2019-03-05 08:36:02.340 7f2743a8c1c0 0 set rocksdb option >>>> compression = kNoCompression >>>> -450> 2019-03-05 08:36:02.340 7f2743a8c1c0 0 set rocksdb option >>>> max_write_buffer_number = 4 >>>> -449> 2019-03-05 08:36:02.340 7f2743a8c1c0 0 set rocksdb option >>>> min_write_buffer_number_to_merge = 1 >>>> -448> 2019-03-05 08:36:02.340 7f2743a8c1c0 0 set rocksdb option >>>> recycle_log_file_num = 4 >>>> -447> 2019-03-05 08:36:02.340 7f2743a8c1c0 0 set rocksdb option >>>> writable_file_max_buffer_size = 0 >>>> -446> 2019-03-05 08:36:02.340 7f2743a8c1c0 0 set rocksdb option >>>> write_buffer_size = 268435456 >>>> -445> 2019-03-05 08:36:02.340 7f2743a8c1c0 1 rocksdb: do_open column >>>> families: [default] >>>> -444> 2019-03-05 08:36:02.341 7f2743a8c1c0 4 rocksdb: RocksDB version: >>>> 5.13.0 >>>> -443> 2019-03-05 08:36:02.342 7f2743a8c1c0 4 rocksdb: Git sha >>>> rocksdb_build_git_sha:@0@ >>>> -442> 2019-03-05 08:36:02.342 7f2743a8c1c0 4 rocksdb: Compile date Jan >>>> 4 2019 >>>> ... >>>> -271> 2019-03-05 08:36:02.431 7f2743a8c1c0 1 freelist init >>>> -270> 2019-03-05 08:36:02.535 7f2743a8c1c0 1 >>>> bluestore(/var/lib/ceph/osd/ceph-12) _open_alloc opening allocation >>>> metadata >>>> -269> 2019-03-05 08:36:02.714 7f2743a8c1c0 1 >>>> bluestore(/var/lib/ceph/osd/ceph-12) _open_alloc loaded 93 GiB in 31828 >>>> extents >>>> -268> 2019-03-05 08:36:02.722 7f2743a8c1c0 2 osd.12 0 journal looks like >>>> hdd >>>> -267> 2019-03-05 08:36:02.722 7f2743a8c1c0 2 osd.12 0 boot >>>> -266> 2019-03-05 08:36:02.723 7f272a0f3700 5 >>>> bluestore.MempoolThread(0x55b31af46a30) _tune_cache_size target: >>>> 1073741824 heap: 64675840 unmapped: 786432 mapped: 63889408 old >>>> cache_size: 134217728 new cache size: 17349132402135320576 >>>> -265> 2019-03-05 08:36:02.723 7f272a0f3700 5 >>>> bluestore.MempoolThread(0x55b31af46a30) _trim_shards cache_size: >>>> 17349132402135320576 kv_alloc: 134217728 kv_used: 5099462 meta_alloc: 0 >>>> meta_used: 21301 data_alloc: 0 data_used: 0 >>>> ... >>>> 2019-03-05 08:36:40.166 7f03fc57f700 1 osd.12 pg_epoch: 7063 pg[2.93( v >>>> 6687'5 (0'0,6687'5] local-lis/les=7015/7016 n=1 ec=103/103 lis/c 7015/7015 >>>> les/c/f 7016/7016/0 7063/7063/7063) [12,19] r=0 lpr=7063 pi=[7015,7063)/1 >>>> crt=6687'5 lcod 0'0 mlcod 0'0 unknown NOTIFY mbc={}] >>>> start_peering_interval up [19] -> [12,19], acting [19] -> [12,19], >>>> acting_primary 19 -> 12, up_primary 19 -> 12, role -1 -> 0, features >>>> acting 4611087854031142907 upacting 4611087854031142907 >>>> 2019-03-05 08:36:40.167 7f03fc57f700 1 osd.12 pg_epoch: 7063 pg[2.93( v >>>> 6687'5 (0'0,6687'5] local-lis/les=7015/7016 n=1 ec=103/103 lis/c 7015/7015 >>>> les/c/f 7016/7016/0 7063/7063/7063) [12,19] r=0 lpr=7063 pi=[7015,7063)/1 >>>> crt=6687'5 lcod 0'0 mlcod 0'0 unknown mbc={}] state<Start>: transitioning >>>> to Primary >>>> 2019-03-05 08:36:40.167 7f03fb57d700 1 osd.12 pg_epoch: 7061 pg[2.40( v >>>> 6964'703 (0'0,6964'703] local-lis/les=6999/7000 n=1 ec=103/103 lis/c >>>> 6999/6999 les/c/f 7000/7000/0 7061/7061/6999) [8] r=-1 lpr=7061 >>>> pi=[6999,7061)/1 crt=6964'703 lcod 0'0 unknown mbc={}] >>>> start_peering_interval up [8,12] -> [8], acting [8,12] -> [8], >>>> acting_primary 8 -> 8, up_primary 8 -> 8, role 1 -> -1, features acting >>>> 4611087854031142907 upacting 4611087854031142907 >>>> 1/ 5 heartbeatmap >>>> 1/ 5 perfcounter >>>> 1/ 5 rgw >>>> 1/ 5 rgw_sync >>>> 1/10 civetweb >>>> 1/ 5 javaclient >>>> 1/ 5 asok >>>> 1/ 1 throttle >>>> 0/ 0 refs >>>> 1/ 5 xio >>>> 1/ 5 compressor >>>> 1/ 5 bluestore >>>> 1/ 5 bluefs >>>> 1/ 3 bdev >>>> 1/ 5 kstore >>>> 4/ 5 rocksdb >>>> 4/ 5 leveldb >>>> 4/ 5 memdb >>>> 1/ 5 kinetic >>>> 1/ 5 fuse >>>> 1/ 5 mgr >>>> 1/ 5 mgrc >>>> 1/ 5 dpdk >>>> 1/ 5 eventtrace >>>> -2/-2 (syslog threshold) >>>> -1/-1 (stderr threshold) >>>> max_recent 10000 >>>> max_new 1000 >>>> log_file /var/log/ceph/ceph-osd.12.log >>>> --- end dump of recent events --- >>>> >>>> 2019-03-05 08:36:07.750 7f272a0f3700 -1 *** Caught signal (Aborted) ** >>>> in thread 7f272a0f3700 thread_name:bstore_mempool >>>> >>>> ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic >>>> (stable) >>>> 1: (()+0x911e70) [0x55b318337e70] >>>> 2: (()+0xf5d0) [0x7f2737a4e5d0] >>>> 3: (gsignal()+0x37) [0x7f2736a6f207] >>>> 4: (abort()+0x148) [0x7f2736a708f8] >>>> 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>> const*)+0x242) [0x7f273aec62b2] >>>> 6: (()+0x25a337) [0x7f273aec6337] >>>> 7: (()+0x7a886e) [0x55b3181ce86e] >>>> 8: (BlueStore::MempoolThread::entry()+0x3b0) [0x55b3181d0060] >>>> 9: (()+0x7dd5) [0x7f2737a46dd5] >>>> 10: (clone()+0x6d) [0x7f2736b36ead] >>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed >>>> to interpret this. >>>> >>>> >>>> Even without the ‘osd memory target’ conf key, OSD claims on start: >>>> >>>> bluestore(/var/lib/ceph/osd/ceph-12) _set_cache_sizes cache_size 1073741824 >>>> >>>> Any hints appreciated! >>>> >>>> /Steffen >>>> >>>> >>>> Paul >>>> >>>> -- >>>> Paul Emmerich >>>> >>>> Looking for help with your Ceph cluster? Contact us at https://croit.io >>>> >>>> croit GmbH >>>> Freseniusstr. 31h >>>> 81247 München >>>> www.croit.io >>>> Tel: +49 89 1896585 90 >>>> >>>> On Mon, Mar 4, 2019 at 3:55 PM Steffen Winther Sørensen >>>> <ste...@gmail.com> wrote: >>>> >>>> >>>> List Members, >>>> >>>> patched a centos 7 based cluster from 13.2.2 to 13.2.4 last monday, >>>> everything appeared working fine. >>>> >>>> Only this morning I found all OSDs in the cluster to be bloated in memory >>>> foot print, possible after weekend backup through MDS. >>>> >>>> Anyone else seeing possible memory leak in 13.2.4 OSD possible primarily >>>> when using MDS? >>>> >>>> TIA >>>> >>>> /Steffen >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@lists.ceph.com >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >>>> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com