Hi Joao, We followed your instruction to create the store dump
ceph-kvstore-tool /var/lib/ceph/mon/ceph-FOO/store.db list > store.dump' for above store's location, let's call it $STORE: for m in osdmap pgmap; do for k in first_committed last_committed; do ceph-kvstore-tool $STORE get $m $k >> store.dump done done ceph-kvstore-tool $STORE get pgmap_meta last_osdmap_epoch >> store.dump ceph-kvstore-tool $STORE get pgmap_meta version >> store.dump Please find the store dump on the following link. http://jmp.sh/LUh6iWo -- Thanks & Regards K.Mohamed Pakkeer On Mon, Feb 16, 2015 at 8:14 PM, Joao Eduardo Luis <j...@redhat.com> wrote: > On 02/16/2015 12:57 PM, Mohamed Pakkeer wrote: > >> >> Hi ceph-experts, >> >> We are getting "store is getting too big" on our test cluster. >> Cluster is running with giant release and configured as EC pool to test >> cephFS. >> >> cluster c2a97a2f-fdc7-4eb5-82ef-70c52f2eceb1 >> health HEALTH_WARN too few pgs per osd (0 < min 20); mon.master01 >> store is getting too big! 15376 MB >= 15360 MB; mon.master02 store is >> getting too big! 15402 MB >= 15360 MB; mon.master03 store is getting too >> big! 15402 MB >= 15360 MB; clock skew detected on mon.master02, >> mon.master03 >> monmap e3: 3 mons at >> {master01=10.1.2.231:6789/0,master02=10.1.2.232:6789/0, >> master03=10.1.2.233:6789/0 >> <http://10.1.2.231:6789/0,master02=10.1.2.232:6789/0, >> master03=10.1.2.233:6789/0>}, >> election epoch 38, quorum 0,1,2 master01,master02,master03 >> osdmap e97396: 552 osds: 552 up, 552 in >> pgmap v354736: 0 pgs, 0 pools, 0 bytes data, 0 objects >> 8547 GB used, 1953 TB / 1962 TB avail >> >> We tried monitor restart with mon compact on start = true as well as >> manual compaction using 'ceph tell mon.FOO compact'. But it didn't >> reduce the size of store.db. We already deleted the pools and mds to >> start fresh cluster. Do we need to delete the mon and recreate again or >> do we have any solution to reduce the store size? >> > > Could you get us a list of all the keys on the store using > 'ceph-kvstore-tool' ? Instructions on the email you quoted. > > Cheers! > > -Joao > > >> Regards, >> K.Mohamed Pakkeer >> >> >> >> On 12/10/2014 07:30 PM, Kevin Sumner wrote: >> >> The mons have grown another 30GB each overnight (except for 003?), >> which >> is quite worrying. I ran a little bit of testing yesterday after my >> post, but not a significant amount. >> >> I wouldn’t expect compact on start to help this situation based on the >> name since we don’t (shouldn’t?) restart the mons regularly, but there >> appears to be no documentation on it. We’re pretty good on disk space >> on the mons currently, but if that changes, I’ll probably use this to >> see about bringing these numbers in line. >> >> This is an issue that has been seen on larger clusters, and it usually >> takes a monitor restart, with 'mon compact on start = true' or manual >> compaction 'ceph tell mon.FOO compact' to bring the monitor back to a >> sane disk usage level. >> >> However, I have not been able to reproduce this in order to track the >> source. I'm guessing I lack the scale of the cluster, or the appropriate >> workload (maybe both). >> >> What kind of workload are you running the cluster through? You mention >> cephfs, but do you have any more info you can share that could help us >> reproducing this state? >> >> Sage also fixed an issue that could potentially cause this (depending on >> what is causing it in the first place) [1,2,3]. This bug, #9987, is due >> to a given cached value not being updated, leading to the monitor not >> removing unnecessary data, potentially causing this growth. This cached >> value would be set to its proper value when the monitor is restarted >> though, so a simple restart would have all this unnecessary data blown >> away. >> >> Restarting the monitor ends up masking the true cause of the store >> growth: whether from #9987 or from obsolete data kept by the monitor's >> backing store (leveldb), either due to misuse of leveldb or due to >> leveldb's nature (haven't been able to ascertain which may be at fault, >> partly due to being unable to reproduce the problem). >> >> If you are up to it, I would suggest the following approach in hope to >> determine what may be at fault: >> >> 1) 'ceph tell mon.FOO compact' -- which will force the monitor to >> compact its store. This won't close leveldb, so it won't have much >> effect on the store size if it happens to be leveldb holding on to some >> data (I could go into further detail, but I don't think this is the >> right medium). 1.a) you may notice the store increasing in size during >> this period; it's expected. 1.b) compaction may take a while, but in the >> end you'll hopefully see a significant reduction in size. >> >> 2) Assuming that failed, I would suggest doing the following: >> >> 2.1) grab ceph-kvstore-tool from the ceph-test package >> 2.2) stop the monitor >> >> 2.3) run 'ceph-kvstore-tool /var/lib/ceph/mon/ceph-FOO/store.db list > >> store.dump' >> >> 2.4) run (for above store's location, let's call it $STORE: >> >> for m in osdmap pgmap; do >> for k in first_committed last_committed; do >> ceph-kvstore-tool $STORE get $m $k >> store.dump >> done >> done >> >> ceph-kvstore-tool $STORE get pgmap_meta last_osdmap_epoch >> store.dump >> ceph-kvstore-tool $STORE get pgmap_meta version >> store.dump >> >> 2.5) send over the results of the dump >> >> 2.6) if you were to compress the store as well and send me a link to >> grab it I would appreciate it. >> >> 3) Next you could simply restart the monitor (without 'mon compact on >> start = true'); if the monitor's store size decreases, then there's a >> fair chance that you've been bit by #9987. Otherwise, it may be >> leveldb's clutter. You should also note that leveldb may itself compact >> automatically on start, so it's hard to say for sure what fixed what. >> >> 4) If store size hasn't gone back to sane levels by now, you may wish to >> restart with 'mon compact on start = true' and see if it helps. If it >> doesn't, then we may have a completely different issue in our hands. >> >> Now, assuming your store size went down on step 3, and if you are >> willing, it would be interesting to see if Sage's patches helps out in >> any way. The patches have not been backported to the giant branch yet, >> so you would have to apply them yourself. For them to work you would >> have to run the patched monitor as the leader. I would suggest leaving >> the other monitors running an unpatched version so they could act as the >> control group. >> >> Let us know if any of this helps. >> >> Cheers! >> >> -Joao >> >> [1] -http://tracker.ceph.com/issues/9987 >> [2] - 093c5f0cabeb552b90d944da2c50de48fcf6f564 >> [3] - 3fb731b722c50672a5a9de0c86a621f5f50f2d06 >> >> :: ~ » ceph health detail | grep 'too big' >> HEALTH_WARN mon.cluster4-monitor001 store is getting too big! 77365 MB >> >= 15360 MB; mon.cluster4-monitor002 store is getting too big! >> 87868 MB >> >= 15360 MB; mon.cluster4-monitor003 store is getting too big! >> 30359 MB >> >= 15360 MB; mon.cluster4-monitor004 store is getting too big! >> 93414 MB >> >= 15360 MB; mon.cluster4-monitor005 store is getting too big! >> 88232 MB >> >= 15360 MB >> mon.cluster4-monitor001 store is getting too big! 77365 MB >= 15360 MB >> -- 72% avail >> mon.cluster4-monitor002 store is getting too big! 87868 MB >= 15360 MB >> -- 70% avail >> mon.cluster4-monitor003 store is getting too big! 30359 MB >= 15360 MB >> -- 85% avail >> mon.cluster4-monitor004 store is getting too big! 93414 MB >= 15360 MB >> -- 69% avail >> mon.cluster4-monitor005 store is getting too big! 88232 MB >= 15360 MB >> -- 71% avail >> -- >> Kevin Sumner >> ke...@sumner.io <mailto:ke...@sumner.io> <mailto:ke...@sumner.io> >> >> >> >> On Dec 9, 2014, at 6:20 PM, Haomai Wang <haomaiw...@gmail.com >> <mailto:haomaiw...@gmail.com> >> <mailto:haomaiw...@gmail.com>> wrote: >> >> Maybe you can enable "mon_compact_on_start=true" when restarting >> mon, >> it will compact data >> >> On Wed, Dec 10, 2014 at 6:50 AM, Kevin Sumner <ke...@sumner.io >> <mailto:ke...@sumner.io> >> <mailto:ke...@sumner.io>> wrote: >> >> Hi all, >> >> We recently upgraded our cluster to Giant from. Since then, >> we’ve been >> driving load tests against CephFS. However, we’re getting >> “store is >> getting >> too big” warnings from the monitors and the mons have started >> consuming way >> more disk space, 40GB-60GB now as opposed to ~10GB >> pre-upgrade. Is this >> expected? Is there anything I can do to ease the store’s >> size? >> >> Thanks! >> >> :: ~ » ceph status >> cluster f1aefa73-b968-41e0-9a28-9a465db5f10b >> health HEALTH_WARN mon.cluster4-monitor001 store is >> getting too big! >> 45648 MB >= 15360 MB; mon.cluster4-monitor002 store is >> getting too big! >> 56939 MB >= 15360 MB; mon.cluster4-monitor003 store is >> getting too big! >> 28647 MB >= 15360 MB; mon.cluster4-monitor004 store is >> getting too big! >> 60655 MB >= 15360 MB; mon.cluster4-monitor005 store is >> getting too big! >> 57335 MB >= 15360 MB >> monmap e3: 5 mons at >> {cluster4-monitor001=17.138.96.12:6789/0,cluster4- >> monitor002=17.138.96.13:6789/0,cluster4-monitor003=17.138. >> 96.14:6789/0,cluster4-monitor004=17.138.96.15:6789/ >> 0,cluster4-monitor005=17.138.96.16:6789/0 <http://17.138.96.12:6789/0, >> cluster4-monitor002=17.138.96.13:6789/0,cluster4-monitor003= >> 17.138.96.14:6789/0,cluster4-monitor004=17.138.96.15:6789/ >> 0,cluster4-monitor005=17.138.96.16:6789/0>}, >> election epoch 34938, quorum 0,1,2,3,4 >> cluster4-monitor001,cluster4-monitor002,cluster4- >> monitor003,cluster4-monitor004,cluster4-monitor005 >> mdsmap e6538: 1/1/1 up {0=cluster4-monitor001=up:active} >> osdmap e49500: 501 osds: 470 up, 469 in >> pgmap v1369307: 98304 pgs, 3 pools, 4933 GB data, 1976 >> kobjects >> 16275 GB used, 72337 GB / 93366 GB avail >> 98304 active+clean >> client io 3463 MB/s rd, 18710 kB/s wr, 7456 op/s >> -- >> Kevin Sumner >> ke...@sumner.io <mailto:ke...@sumner.io> <mailto: >> ke...@sumner.io> >> >> >> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> -- >> Best Regards, >> >> Wheat >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> -- >> Thanks & Regards >> K.Mohamed Pakkeer >> Mobile- 0091-8754410114 >> >> >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com