On Tue, Jun 16, 2015 at 12:03 PM, Jan Schermer <j...@schermer.cz> wrote: > >> On 16 Jun 2015, at 12:59, Gregory Farnum <g...@gregs42.com> wrote: >> >> On Tue, Jun 16, 2015 at 11:53 AM, Jan Schermer <j...@schermer.cz> wrote: >>> Well, I see mons dropping out when deleting large amount of snapshots, and >>> it leats a _lot_ of CPU to delete them >> >> Well, you're getting past my expertise on the subject, but deleting >> snapshots can sometimes be expensive, yes. If monitors are dropping >> out that probably means they're getting swamped by the OSDs sending in >> updates; you might want to adjust your reporting config options >> (osd_mon_report_interval_max, osd_mon_report_interval_min, and >> osd_pg_stat_report_interval_max). >> > > They are running fine under normal conditions, but when I run a script that > deletes lots (say a 100) snapshots and volumes then the mon node I’m running > it from drops out occassionaly: > > ceph-mon.node-14.log:2015-06-15 06:01:28.417302 7fee3945d700 -1 > mon.node-14@2(peon).paxos(paxos updating c 38322179..38322802) lease_expire > from mon.0 172.20.1.3:6789/0 is 16.491629 seconds in the past; mons are > probably laggy (or possibly clocks are too skewed) > > And during this time CPU usage on all the OSDs spikes to 200%+ > > >>> , I also had to manually schedule “compact” for the leveldb on mons as it >>> stopped compacting itself. But that doesn’t impact IO as far as I know (or >>> does the mon speed actually impact IO?). >> >> It own't impact IO generally, unless it's blocking cluster updates. > > When I run the compact, the node that’s told to compact drops out of quorom > and rejoins when finished - it only runs for 10-30s twice daily, so that’s > not a big problem, I just thought I should mention it in case someone know a > better solution. > >> >> >>> I can see lots of layers on the snapshotted volumes as well as extremely >>> large overhead with snapshots (2.5TB volume that actually occupies 7TB(*3) >>> of space until even the HEAD is deleted), but that’s a different story I >>> guess… >> >> It's possible that you've just got enough snapshots the OSDs haven't >> ever caught up with deleting them...not sure. :/ > > I can see the data disappear after I delete snapshots (would be nice to know > the progress of the snapshot pruning - there’s no indication it’s doing > anything in the background but I can see it deleting TBs of files). > >> >> What version of Ceph are you currently running? > > 0.67.12 dumpling.
Ah, yeah, I believe this is all much nicer on Firefly and Hammer. :) _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com