Hi Joao,

We followed your instruction to create the store dump

ceph-kvstore-tool /var/lib/ceph/mon/ceph-FOO/store.db list > store.dump'

for above store's location, let's call it $STORE:

for m in osdmap pgmap; do
  for k in first_committed last_committed; do
    ceph-kvstore-tool $STORE get $m $k >> store.dump
  done
done

ceph-kvstore-tool $STORE get pgmap_meta last_osdmap_epoch >> store.dump
ceph-kvstore-tool $STORE get pgmap_meta version >> store.dump


Please find the store dump on the following link.

http://jmp.sh/LUh6iWo


-- 
Thanks & Regards
K.Mohamed Pakkeer



On Mon, Feb 16, 2015 at 8:14 PM, Joao Eduardo Luis <j...@redhat.com> wrote:

> On 02/16/2015 12:57 PM, Mohamed Pakkeer wrote:
>
>>
>>   Hi ceph-experts,
>>
>>    We are getting "store is getting too big" on our test cluster.
>> Cluster is running with giant release and configured as EC pool to test
>> cephFS.
>>
>> cluster c2a97a2f-fdc7-4eb5-82ef-70c52f2eceb1
>>       health HEALTH_WARN too few pgs per osd (0 < min 20); mon.master01
>> store is getting too big! 15376 MB >= 15360 MB; mon.master02 store is
>> getting too big! 15402 MB >= 15360 MB; mon.master03 store is getting too
>> big! 15402 MB >= 15360 MB; clock skew detected on mon.master02,
>> mon.master03
>>       monmap e3: 3 mons at
>> {master01=10.1.2.231:6789/0,master02=10.1.2.232:6789/0,
>> master03=10.1.2.233:6789/0
>> <http://10.1.2.231:6789/0,master02=10.1.2.232:6789/0,
>> master03=10.1.2.233:6789/0>},
>> election epoch 38, quorum 0,1,2 master01,master02,master03
>>       osdmap e97396: 552 osds: 552 up, 552 in
>>        pgmap v354736: 0 pgs, 0 pools, 0 bytes data, 0 objects
>>              8547 GB used, 1953 TB / 1962 TB avail
>>
>> We tried monitor restart with mon compact on start = true as well as
>> manual compaction using 'ceph tell mon.FOO compact'. But it didn't
>> reduce the size of store.db. We already deleted the pools and mds to
>> start fresh cluster. Do we need to delete the mon and recreate again or
>> do we have any solution to reduce the store size?
>>
>
> Could you get us a list of all the keys on the store using
> 'ceph-kvstore-tool' ?  Instructions on the email you quoted.
>
> Cheers!
>
>   -Joao
>
>
>> Regards,
>> K.Mohamed Pakkeer
>>
>>
>>
>> On 12/10/2014 07:30 PM, Kevin Sumner wrote:
>>
>>     The mons have grown another 30GB each overnight (except for 003?),
>> which
>>     is quite worrying.  I ran a little bit of testing yesterday after my
>>     post, but not a significant amount.
>>
>>     I wouldn’t expect compact on start to help this situation based on the
>>     name since we don’t (shouldn’t?) restart the mons regularly, but there
>>     appears to be no documentation on it.  We’re pretty good on disk space
>>     on the mons currently, but if that changes, I’ll probably use this to
>>     see about bringing these numbers in line.
>>
>> This is an issue that has been seen on larger clusters, and it usually
>> takes a monitor restart, with 'mon compact on start = true' or manual
>> compaction 'ceph tell mon.FOO compact' to bring the monitor back to a
>> sane disk usage level.
>>
>> However, I have not been able to reproduce this in order to track the
>> source. I'm guessing I lack the scale of the cluster, or the appropriate
>> workload (maybe both).
>>
>> What kind of workload are you running the cluster through? You mention
>> cephfs, but do you have any more info you can share that could help us
>> reproducing this state?
>>
>> Sage also fixed an issue that could potentially cause this (depending on
>> what is causing it in the first place) [1,2,3]. This bug, #9987, is due
>> to a given cached value not being updated, leading to the monitor not
>> removing unnecessary data, potentially causing this growth. This cached
>> value would be set to its proper value when the monitor is restarted
>> though, so a simple restart would have all this unnecessary data blown
>> away.
>>
>> Restarting the monitor ends up masking the true cause of the store
>> growth: whether from #9987 or from obsolete data kept by the monitor's
>> backing store (leveldb), either due to misuse of leveldb or due to
>> leveldb's nature (haven't been able to ascertain which may be at fault,
>> partly due to being unable to reproduce the problem).
>>
>> If you are up to it, I would suggest the following approach in hope to
>> determine what may be at fault:
>>
>> 1) 'ceph tell mon.FOO compact' -- which will force the monitor to
>> compact its store. This won't close leveldb, so it won't have much
>> effect on the store size if it happens to be leveldb holding on to some
>> data (I could go into further detail, but I don't think this is the
>> right medium). 1.a) you may notice the store increasing in size during
>> this period; it's expected. 1.b) compaction may take a while, but in the
>> end you'll hopefully see a significant reduction in size.
>>
>> 2) Assuming that failed, I would suggest doing the following:
>>
>> 2.1) grab ceph-kvstore-tool from the ceph-test package
>> 2.2) stop the monitor
>>
>> 2.3) run 'ceph-kvstore-tool /var/lib/ceph/mon/ceph-FOO/store.db list >
>> store.dump'
>>
>> 2.4) run (for above store's location, let's call it $STORE:
>>
>> for m in osdmap pgmap; do
>>    for k in first_committed last_committed; do
>>      ceph-kvstore-tool $STORE get $m $k >> store.dump
>>    done
>> done
>>
>> ceph-kvstore-tool $STORE get pgmap_meta last_osdmap_epoch >> store.dump
>> ceph-kvstore-tool $STORE get pgmap_meta version >> store.dump
>>
>> 2.5) send over the results of the dump
>>
>> 2.6) if you were to compress the store as well and send me a link to
>> grab it I would appreciate it.
>>
>> 3) Next you could simply restart the monitor (without 'mon compact on
>> start = true'); if the monitor's store size decreases, then there's a
>> fair chance that you've been bit by #9987. Otherwise, it may be
>> leveldb's clutter. You should also note that leveldb may itself compact
>> automatically on start, so it's hard to say for sure what fixed what.
>>
>> 4) If store size hasn't gone back to sane levels by now, you may wish to
>> restart with 'mon compact on start = true' and see if it helps. If it
>> doesn't, then we may have a completely different issue in our hands.
>>
>> Now, assuming your store size went down on step 3, and if you are
>> willing, it would be interesting to see if Sage's patches helps out in
>> any way. The patches have not been backported to the giant branch yet,
>> so you would have to apply them yourself. For them to work you would
>> have to run the patched monitor as the leader. I would suggest leaving
>> the other monitors running an unpatched version so they could act as the
>> control group.
>>
>> Let us know if any of this helps.
>>
>> Cheers!
>>
>>    -Joao
>>
>> [1] -http://tracker.ceph.com/issues/9987
>> [2] - 093c5f0cabeb552b90d944da2c50de48fcf6f564
>> [3] - 3fb731b722c50672a5a9de0c86a621f5f50f2d06
>>
>>     :: ~ » ceph health detail | grep 'too big'
>>     HEALTH_WARN mon.cluster4-monitor001 store is getting too big! 77365 MB
>>       >= 15360 MB; mon.cluster4-monitor002 store is getting too big!
>> 87868 MB
>>       >= 15360 MB; mon.cluster4-monitor003 store is getting too big!
>> 30359 MB
>>       >= 15360 MB; mon.cluster4-monitor004 store is getting too big!
>> 93414 MB
>>       >= 15360 MB; mon.cluster4-monitor005 store is getting too big!
>> 88232 MB
>>       >= 15360 MB
>>     mon.cluster4-monitor001 store is getting too big! 77365 MB >= 15360 MB
>>     -- 72% avail
>>     mon.cluster4-monitor002 store is getting too big! 87868 MB >= 15360 MB
>>     -- 70% avail
>>     mon.cluster4-monitor003 store is getting too big! 30359 MB >= 15360 MB
>>     -- 85% avail
>>     mon.cluster4-monitor004 store is getting too big! 93414 MB >= 15360 MB
>>     -- 69% avail
>>     mon.cluster4-monitor005 store is getting too big! 88232 MB >= 15360 MB
>>     -- 71% avail
>>     --
>>     Kevin Sumner
>>     ke...@sumner.io  <mailto:ke...@sumner.io>  <mailto:ke...@sumner.io>
>>
>>
>>
>>         On Dec 9, 2014, at 6:20 PM, Haomai Wang <haomaiw...@gmail.com
>> <mailto:haomaiw...@gmail.com>
>>         <mailto:haomaiw...@gmail.com>> wrote:
>>
>>         Maybe you can enable "mon_compact_on_start=true" when restarting
>> mon,
>>         it will compact data
>>
>>         On Wed, Dec 10, 2014 at 6:50 AM, Kevin Sumner <ke...@sumner.io
>> <mailto:ke...@sumner.io>
>>         <mailto:ke...@sumner.io>> wrote:
>>
>>             Hi all,
>>
>>             We recently upgraded our cluster to Giant from.  Since then,
>> we’ve been
>>             driving load tests against CephFS.  However, we’re getting
>> “store is
>>             getting
>>             too big” warnings from the monitors and the mons have started
>>             consuming way
>>             more disk space, 40GB-60GB now as opposed to ~10GB
>> pre-upgrade.  Is this
>>             expected?  Is there anything I can do to ease the store’s
>> size?
>>
>>             Thanks!
>>
>>             :: ~ » ceph status
>>                 cluster f1aefa73-b968-41e0-9a28-9a465db5f10b
>>                  health HEALTH_WARN mon.cluster4-monitor001 store is
>> getting too big!
>>             45648 MB >= 15360 MB; mon.cluster4-monitor002 store is
>> getting too big!
>>             56939 MB >= 15360 MB; mon.cluster4-monitor003 store is
>> getting too big!
>>             28647 MB >= 15360 MB; mon.cluster4-monitor004 store is
>> getting too big!
>>             60655 MB >= 15360 MB; mon.cluster4-monitor005 store is
>> getting too big!
>>             57335 MB >= 15360 MB
>>                  monmap e3: 5 mons at
>>             {cluster4-monitor001=17.138.96.12:6789/0,cluster4-
>> monitor002=17.138.96.13:6789/0,cluster4-monitor003=17.138.
>> 96.14:6789/0,cluster4-monitor004=17.138.96.15:6789/
>> 0,cluster4-monitor005=17.138.96.16:6789/0  <http://17.138.96.12:6789/0,
>> cluster4-monitor002=17.138.96.13:6789/0,cluster4-monitor003=
>> 17.138.96.14:6789/0,cluster4-monitor004=17.138.96.15:6789/
>> 0,cluster4-monitor005=17.138.96.16:6789/0>},
>>             election epoch 34938, quorum 0,1,2,3,4
>>             cluster4-monitor001,cluster4-monitor002,cluster4-
>> monitor003,cluster4-monitor004,cluster4-monitor005
>>                  mdsmap e6538: 1/1/1 up {0=cluster4-monitor001=up:active}
>>                  osdmap e49500: 501 osds: 470 up, 469 in
>>                   pgmap v1369307: 98304 pgs, 3 pools, 4933 GB data, 1976
>> kobjects
>>                         16275 GB used, 72337 GB / 93366 GB avail
>>                            98304 active+clean
>>               client io 3463 MB/s rd, 18710 kB/s wr, 7456 op/s
>>             --
>>             Kevin Sumner
>>             ke...@sumner.io  <mailto:ke...@sumner.io>  <mailto:
>> ke...@sumner.io>
>>
>>
>>
>>
>>             _______________________________________________
>>             ceph-users mailing list
>>             ceph-users@lists.ceph.com  <mailto:ceph-users@lists.ceph.com>
>>             http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>         --
>>         Best Regards,
>>
>>         Wheat
>>
>>     _______________________________________________
>>     ceph-users mailing list
>>     ceph-users@lists.ceph.com  <mailto:ceph-users@lists.ceph.com>
>>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com  <mailto:ceph-users@lists.ceph.com>
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>> --
>> Thanks & Regards
>> K.Mohamed Pakkeer
>> Mobile- 0091-8754410114
>>
>>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to