that's a hex number or 2147483648 decimal Chad DePue inakanetworks.com - development consulting | skype cdepue | @chaddepue +1 206.866.5707
On Mon, Jun 13, 2011 at 7:08 PM, Steve Webb <sw...@gnip.com> wrote: > Dan - > > Q: What does the syntax: 16#80000000 represent in the max_file_size > parameter? It's supposed to be 2GB, but I can't see where that means 2GB > anywhere. > > Even if that meant 16 files of 80MB each, that only comes out to slightly > over 1GB. > > > - Steve > > -- > Steve Webb - Senior System Administrator for gnip.com > http://twitter.com/GnipWebb > > On Mon, 13 Jun 2011, Dan Reverri wrote: > > Hi Steve, >> >> The article points out that the active data file is not considered during >> merge checks. Your 250-ish MB data file is the active file and not >> considered during the merge check. The file will eventually role over to a >> non-active file when it hits 2 GB in size. Once the file is not active it >> will be considered during the merge check and merging will take place. >> >> The 2 GB file size is configurable via the max_file_size parameter: >> https://github.com/basho/bitcask/blob/master/ebin/bitcask.app#L22 >> >> Thanks, >> Dan >> >> Daniel Reverri >> Developer Advocate >> Basho Technologies, Inc. >> d...@basho.com >> >> >> On Mon, Jun 13, 2011 at 2:38 PM, Steve Webb <sw...@gnip.com> wrote: >> >> Dan - >>> >>> I've got dead_bytes_threshold=5242880 (5M) and >>> dead_bytes_merge_trigger=10242880. My bitcask *.data files are 250-ish >>> MB >>> in size: >>> >>> root@ha2 >>> :/data/riaksearch/bitcask/1027618338748291114361965898003636498195577569280# >>> ls -lah >>> total 771M >>> drwxr-xr-x 2 riak riak 4.0K 2011-06-12 01:08 . >>> drwxr-xr-x 34 riak riak 4.0K 2011-06-12 01:10 .. >>> -rw------- 1 riak riak 229M 2011-06-08 13:11 1307415077.bitcask.data >>> -rw-r--r-- 1 riak riak 4.3M 2011-06-08 13:11 1307415077.bitcask.hint >>> -rw------- 1 riak riak 276M 2011-06-10 13:30 1307562153.bitcask.data >>> -rw-r--r-- 1 riak riak 5.1M 2011-06-10 13:30 1307562153.bitcask.hint >>> -rw------- 1 riak riak 1.4M 2011-06-08 13:45 1307562333.bitcask.data >>> -rw-r--r-- 1 riak riak 27K 2011-06-08 13:45 1307562333.bitcask.hint >>> -rw------- 1 riak riak 246M 2011-06-13 15:34 1307862506.bitcask.data >>> -rw-r--r-- 1 riak riak 9.4M 2011-06-13 15:34 1307862506.bitcask.hint >>> -rw------- 1 riak riak 107 2011-06-12 01:08 bitcask.write.lock >>> >>> I'm pretty sure that 50% or more of the data in these files should've >>> aged-off by now and the merge trigger should've happened. The article >>> shows >>> why merges happen when a restart is done, but it doesn't really explain >>> why >>> merges don't happen at normal runtime. >>> >>> I really don't want to restart riak every day to merge files. >>> >>> Q: What are some good trigger settings for my use case? >>> >>> I want to collect and store 1 day worth of tweets from the twitter >>> spritzer >>> feed and have the data files auto-merge once in a while (once a day or >>> more >>> frequently) when they've gotten 10% of 'dead' data in them (aka, the >>> tweets >>> expire after 1 day). >>> >>> >>> - Steve >>> >>> -- >>> Steve Webb - Senior System Administrator for gnip.com >>> http://twitter.com/GnipWebb >>> >>> On Mon, 13 Jun 2011, Dan Reverri wrote: >>> >>> Hi Steve, >>> >>>> >>>> This Knowledge Base article may be related: >>>> >>>> >>>> https://help.basho.com/entries/20141178-why-does-it-seem-that-bitcask-merging-is-only-triggered-when-a-riak-node-is-restarted >>>> >>>> Thanks, >>>> Dan >>>> >>>> Daniel Reverri >>>> Developer Advocate >>>> Basho Technologies, Inc. >>>> d...@basho.com >>>> >>>> >>>> On Mon, Jun 13, 2011 at 10:25 AM, Steve Webb <sw...@gnip.com> wrote: >>>> >>>> Justin - >>>> >>>>> >>>>> My current bitcask settings are: >>>>> >>>>> %% Bitcask Config >>>>> {bitcask, [ >>>>> {data_root, "/var/lib/riaksearch/bitcask" }, >>>>> {dead_bytes_merge_trigger, 10242880 }, >>>>> {dead_bytes_threshold, 5242880 }, >>>>> {expiry_secs, 86400} >>>>> ]}, >>>>> >>>>> My understanding of these settings mean that the data should >>>>> auto-expire >>>>> after one day. Also, once each bitcask file in >>>>> .../riaksearch/bitcask/xxx/*.data once it has 10M of "dead" or expired >>>>> data >>>>> in it, should be merged, right? >>>>> >>>>> I'm collecting the spritzer twitter stream and loading it into two >>>>> buckets >>>>> (one non-indexed bucket holds the full tweet, one indexed bucket holds >>>>> the >>>>> tweet string, id, date and username). I used to see about 10 GB of >>>>> data >>>>> total, but it's growing and currently at 26GB of data total. >>>>> >>>>> I'm seeing these in the logs: >>>>> >>>>> INFO REPORT==== 13-Jun-2011::08:28:19 === >>>>> Pid <0.6844.0> compacted 3 segments for 942232 bytes in 4.900694 >>>>> seconds, >>>>> 0.18 MB/sec >>>>> >>>>> =INFO REPORT==== 13-Jun-2011::08:29:01 === >>>>> Pid <0.6267.0> compacted 3 segments for 1721790 bytes in 9.690511 >>>>> seconds, >>>>> 0.17 MB/sec >>>>> >>>>> =INFO REPORT==== 13-Jun-2011::08:31:23 === >>>>> Pid <0.6924.0> compacted 3 segments for 6988416 bytes in 44.659753 >>>>> seconds, >>>>> 0.15 MB/sec >>>>> >>>>> ... but I'm not seeing any "merging" related entries. >>>>> >>>>> >>>>> - Steve >>>>> >>>>> -- >>>>> Steve Webb - Senior System Administrator for gnip.com >>>>> http://twitter.com/GnipWebb >>>>> >>>>> On Wed, 8 Jun 2011, Justin Sheehy wrote: >>>>> >>>>> Hi, Steve. >>>>> >>>>> >>>>>> Check out this page: >>>>>> >>>>>> >>>>>> http://wiki.basho.com/Bitcask-Configuration.html#Disk-Usage-and-Merging-Settings >>>>>> >>>>>> Basically, a "merge trigger" must be met in order to have the merge >>>>>> process occur. When it does occur, it will affect all existing files >>>>>> that >>>>>> meet a "merge threshold." >>>>>> >>>>>> One note that is relevant for your specific use: the expiry_secs >>>>>> parameter >>>>>> will cause a given item to disappear from the client API immediately >>>>>> after >>>>>> expiry, and to be cleaned if it is in a file already being merged, but >>>>>> will >>>>>> not currently contribute toward merge triggers or thresholds on its >>>>>> own >>>>>> if >>>>>> not otherwise "dead". >>>>>> >>>>>> -Justin >>>>>> >>>>>> >>>>>> On Jun 7, 2011, at 4:29 PM, Steve Webb wrote: >>>>>> >>>>>> Hello there. >>>>>> >>>>>> >>>>>>> >>>>>>> I'm curious - I'm up to about 10GB of storage and I'm guessing that >>>>>>> I'll >>>>>>> be full in 3-4 more days of ingesting data. I have no idea if/when a >>>>>>> merge >>>>>>> will run to expire the older data. >>>>>>> >>>>>>> I'm loading a 2-node (1GB mem, 20GB storage, vmware VMs) riaksearch >>>>>>> cluster with the spritzer twitter feed. I used the bitcask >>>>>>> 'expiry_secs' to >>>>>>> expire data after 3 days. Q: Is there a method or command to force a >>>>>>> merge >>>>>>> at any time? Q: Is there a way to run a merge when the storage size >>>>>>> reaches >>>>>>> a specific threshold? >>>>>>> >>>>>>> >>>>>>> - Steve >>>>>>> >>>>>>> -- >>>>>>> Steve Webb - Senior System Administrator for gnip.com >>>>>>> http://twitter.com/GnipWebb >>>>>>> >>>>>>> _______________________________________________ >>>>>>> riak-users mailing list >>>>>>> riak-users@lists.basho.com >>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>>>>>> >>>>>>> >>>>>>> >>>>>> _______________________________________________ >>>>>> >>>>> riak-users mailing list >>>>> riak-users@lists.basho.com >>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>>>> >>>>> >>>>> >>>> >> > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com