Hm, definitely not a cronjob. I'll look at our app and see if there's anything that does something like that there.
On Wed, Jun 15, 2016 at 9:10 PM, Luke Bakken <lbak...@basho.com> wrote: > Hi Johnny, > > Since this seems to happen regularly on one node on your cluster (not > necessarily the same node), do you have a repetitive process that > performs a *lot* of updates or deletes on a single key that could be > correlated to these merges? > -- > Luke Bakken > Engineer > lbak...@basho.com > > > On Wed, Jun 15, 2016 at 10:22 AM, Johnny Tan <johnnyd...@gmail.com> wrote: > > We're running riak-1.4.2 > > > > Every few weeks, we have a riak node that starts to slowly fill up on > disk > > space for several days, and then suddenly gain that space back again. > > > > In looking into this more today, I think I see what's going on. > > > > Per the console.log on a node that it's happening to right now, there > are an > > unusually large amount of merges happening right now. There are 6 total > > nodes in our cluster, it's only happening to this node today. (In > previous > > weeks, it's been other nodes, but it's always been one node at a time.) > > > > Normally, we get 50-70 merges per day per node (according to various > nodes' > > console.log, including the node in question). Yesterday and today, the > node > > in question has several hundred merges happening. > > > > When I look inside the bitcask directory, I see a lot of files with this > set > > of permissions: > > -rwSrw-r-- > > > > My understanding is that those are files marked for deletions after > bitcask > > merging. > > > > The number of those files is currently growing, and from a spot-check, > they > > indeed match up as the files that have been merged. > > > > So it seems the two are related: a lot of merges are happening, which > then > > causes a large number of files to be marked for deletion, and those > marked > > files are piling up and not getting deleted for some reason. > > > > If I don't do anything, those files eventually get deleted, and > everything > > is good again for another couple weeks until it happens to another node. > But > > the disk usage does get high enough to alert us, and obviously we don't > want > > it to get anywhere near 100%. > > > > > > I'm trying to figure out why there are times when this happens. One > thing I > > noticed is a difference in the merge log entries. > > > > Here's one from a "normal" day, nearly all the entries for that day are > > roughly this same length and same amount of time merging: > > 2016-06-10 05:27:39.426 UTC [info] <0.15230.160> Merged > > > {["/var/lib/riak/bitcask/890602560248518965780370444936484965102833893376/84000.bitcask.data","/var/lib/riak/bitcask/890602560248518965780370444936484965102833893376/83999.bitcask.data"],[]} > > in 11.902028 seconds. > > > > But here's one from today on the problematic node: > > 2016-06-15 17:13:40.626 UTC [info] <0.17903.500> Merged > > > {["/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83633.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83632.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83631.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83630.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83629.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83628.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83627.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83626.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83625.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83624.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83623.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83622.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83621.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83620.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83619.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83618.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83617.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83616.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83615.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83614.bitcask.data","/var/lib/riak/bitcask/1233142006497949337234359077604363797834693083136/83613.bitcask.data","/var/lib/riak/bitcask/12331420064979493372343590776043637978346930...",...],...} > > in 220.186043 seconds. > > > > It's not just that it takes 20x longer to merge, but it seems to be > doing a > > lot more files at once. > > > > What is going on? > > > > I'm not sure how much of the app.config is relevant, but I'll at least > paste > > just the bitcask and merge sections for now: > > {bitcask, [ > > {data_root, "/var/lib/riak/bitcask"}, > > {dead_bytes_merge_trigger, 268435456}, > > {dead_bytes_threshold, 67108864}, > > {frag_merge_trigger, 60}, > > {frag_threshold, 40}, > > {io_mode, erlang}, > > {max_file_size, 1073741824}, > > {small_file_threshold, 134217728} > > ]}, > > {merge_index, [ > > {buffer_rollover_size, 1048576}, > > {data_root, "/var/lib/riak/merge_index"}, > > {max_compact_segments, 20} > > ]}, > > > > > > Thanks for any insight, > > johnny > > > > _______________________________________________ > > riak-users mailing list > > riak-users@lists.basho.com > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com