Re: RIAK 1.4.6 - Mass key deletion

Edgar Veiga Thu, 10 Apr 2014 04:35:34 -0700

Hi Matthew!

I have a possibility of moving the data of anti-entropy directory to a
mechanic disk 7200, that exists on each of the machines. I was thinking of
changing the anti_entropy data dir config in app.config file and restart
the riak process.


Is there any problem using a mechanic disk to store the anti-entropy data?

Best regards!


On 8 April 2014 23:58, Edgar Veiga <edgarmve...@gmail.com> wrote:

> I'll wait a few more days, see if the AAE maybe "stabilises" and only
> after that make a decision regarding this.
> The cluster expanding was on the roadmap, but not right now :)
>
> I've attached a few screenshot, you can clearly observe  the evolution of
> one of the machines after the anti-entropy data removal and consequent
> restart  (5th of April).
>
> https://cloudup.com/cB0a15lCMeS
>
> Best regards!
>
>
> On 8 April 2014 23:44, Matthew Von-Maszewski <matth...@basho.com> wrote:
>
>> No.  I do not see a problem with your plan.  But ...
>>
>> I would prefer to see you add servers to your cluster.  Scalabilty is one
>> of Riak's fundamental characteristics.  As your database needs grow, we
>> grow with you ... just add another server and migrate some of the vnodes
>> there.
>>
>> I obviously cannot speak to your budgetary constraints.  All of the
>> engineers at Basho, I am just one, are focused upon providing you
>> performance and features along with your scalability needs.  This seems to
>> be a situation where you might be sacrificing data integrity where another
>> server or two would address the situation.
>>
>> And if 2.0 makes things better ... sell the extra servers on Ebay.
>>
>> Matthew
>>
>>
>> On Apr 8, 2014, at 6:31 PM, Edgar Veiga <edgarmve...@gmail.com> wrote:
>>
>> Thanks Matthew!
>>
>> Today this situation has become unsustainable, In two of the machines I
>> have an anti-entropy dir of 250G... It just keeps growing and growing and
>> I'm almost reaching max size of the disks.
>>
>> Maybe I'll just turn off aae in the cluster, remove all the data in the
>> anti-entropy directory and wait for the v2 of riak. Do you see any problem
>> with this?
>>
>> Best regards!
>>
>>
>> On 8 April 2014 22:11, Matthew Von-Maszewski <matth...@basho.com> wrote:
>>
>>> Edgar,
>>>
>>> Today we disclosed a new feature for Riak's leveldb, Tiered Storage.
>>>  The details are here:
>>>
>>> https://github.com/basho/leveldb/wiki/mv-tiered-options
>>>
>>> This feature might give you another option in managing your storage
>>> volume.
>>>
>>>
>>> Matthew
>>>
>>> On Apr 8, 2014, at 11:07 AM, Edgar Veiga <edgarmve...@gmail.com> wrote:
>>>
>>> It makes sense, I do a lot, and I really mean a LOT of updates per key,
>>> maybe thousands a day! The cluster is experiencing a lot more updates per
>>> each key, than new keys being inserted.
>>>
>>> The hash trees will rebuild during the next weekend (normally it takes
>>> about two days to complete the operation) so I'll come back and give you
>>> some feedback (hopefully good) on the next Monday!
>>>
>>> Again, thanks a lot, You've been very helpful.
>>> Edgar
>>>
>>>
>>> On 8 April 2014 15:47, Matthew Von-Maszewski <matth...@basho.com> wrote:
>>>
>>>> Edgar,
>>>>
>>>> The test I have running currently has reach 1 Billion keys.  It is
>>>> running against a single node with N=1.  It has 42G of AAE data.  Here is
>>>> my extrapolation to compare your numbers:
>>>>
>>>> You have ~2.5 Billion keys.  I assume you are running N=3 (the
>>>> default).  AAE therefore is actually tracking ~7.5 Billion keys.  You have
>>>> six nodes, therefore tracking ~1.25 Billion keys per node.
>>>>
>>>> Raw math would suggest that my 42G of AAE data for 1 billion keys would
>>>> extrapolate to 52.5G of AAE data for you.  Yet you have ~120G of AAE data.
>>>>  Is something wrong?  No.  My data is still loading and has experience zero
>>>> key/value updates/edits.
>>>>
>>>> AAE hashes get rewritten every time a user updates the value of a key.
>>>>  AAE's leveldb is just like the user leveldb, all prior values of a key
>>>> accumulate in the .sst table files until compaction removes duplicates.
>>>>  Similarly, a user delete of a key causes a delete tombstone in the AAE
>>>> hash tree.  Those delete tombstones have to await compactions too before
>>>> leveldb recovers the disk space.
>>>>
>>>> AAE's hash trees rebuild weekly.  I am told that the rebuild operation
>>>> will actually destroy the existing files and start over.  That is when you
>>>> should see AAE space usage dropping dramatically.
>>>>
>>>> Matthew
>>>>
>>>>
>>>> On Apr 8, 2014, at 9:31 AM, Edgar Veiga <edgarmve...@gmail.com> wrote:
>>>>
>>>> Thanks a lot Matthew!
>>>>
>>>> A little bit of more info, I've gathered a sample of the contents of
>>>> anti-entropy data of one of my machines:
>>>> - 44 folders with the name equal to the name of the folders in level-db
>>>> dir (i.e. 393920363186844927172086927568060657641638068224/)
>>>> - each folder has a 5 files (log, current, log, etc) and 5 sst_*
>>>> folders.
>>>> - The biggest sst folder is sst_3 with 4.3G
>>>> - Inside sst_3 folder there are 1219 files name 00****.sst.
>>>> - Each of the 00*****.sst files has ~3.7M
>>>>
>>>> Hope this info gives you some more help!
>>>>
>>>> Best regards, and again, thanks a lot
>>>> Edgar
>>>>
>>>>
>>>> On 8 April 2014 13:24, Matthew Von-Maszewski <matth...@basho.com>wrote:
>>>>
>>>>> Argh. Missed where you said you had upgraded. Ok it will proceed with
>>>>> getting you comparison numbers.
>>>>>
>>>>> Sent from my iPhone
>>>>>
>>>>> On Apr 8, 2014, at 6:51 AM, Edgar Veiga <edgarmve...@gmail.com> wrote:
>>>>>
>>>>> Thanks again Matthew, you've been very helpful!
>>>>>
>>>>> Maybe you can give me some kind of advise on this issue I'm having
>>>>> since I've upgraded to 1.4.8.
>>>>>
>>>>> Since I've upgraded my anti-entropy data has been growing a lot and
>>>>> has only stabilised in very high values... Write now my cluster has 6
>>>>> machines each one with ~120G of anti-entropy data and 600G of level-db
>>>>> data. This seems to be quite a lot no? My total amount of keys is ~2.5
>>>>> Billions.
>>>>>
>>>>> Best regards,
>>>>> Edgar
>>>>>
>>>>> On 6 April 2014 23:30, Matthew Von-Maszewski <matth...@basho.com>wrote:
>>>>>
>>>>>> Edgar,
>>>>>>
>>>>>> This is indirectly related to you key deletion discussion.  I made
>>>>>> changes recently to the aggressive delete code.  The second section of 
>>>>>> the
>>>>>> following (updated) web page discusses the adjustments:
>>>>>>
>>>>>>     https://github.com/basho/leveldb/wiki/Mv-aggressive-delete
>>>>>>
>>>>>> Matthew
>>>>>>
>>>>>>
>>>>>> On Apr 6, 2014, at 4:29 PM, Edgar Veiga <edgarmve...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Matthew, thanks again for the response!
>>>>>>
>>>>>> That said, I'll wait again for the 2.0 (and maybe buy some bigger
>>>>>> disks :)
>>>>>>
>>>>>> Best regards
>>>>>>
>>>>>>
>>>>>> On 6 April 2014 15:02, Matthew Von-Maszewski <matth...@basho.com>wrote:
>>>>>>
>>>>>>> Edgar,
>>>>>>>
>>>>>>> In Riak 1.4, there is no advantage to using empty values versus
>>>>>>> deleting.
>>>>>>>
>>>>>>> leveldb is a "write once" data store.  New data for a given key
>>>>>>> never physically overwrites old data for the same key.  New data "hides"
>>>>>>> the old data by being in a lower level, and therefore picked first.
>>>>>>>
>>>>>>> leveldb's compaction operation will remove older key/value pairs
>>>>>>> only when the newer key/value is pair is part of a compaction involving
>>>>>>> both new and old.  The new and the old key/value pairs must have 
>>>>>>> migrated
>>>>>>> to adjacent levels through normal compaction operations before leveldb 
>>>>>>> will
>>>>>>> see them in the same compaction.  The migration could take days, weeks, 
>>>>>>> or
>>>>>>> even months depending upon the size of your entire dataset and the rate 
>>>>>>> of
>>>>>>> incoming write operations.
>>>>>>>
>>>>>>> leveldb's "delete" object is exactly the same as your empty JSON
>>>>>>> object.  The delete object simply has one more flag set that allows it 
>>>>>>> to
>>>>>>> also be removed if and only if there is no chance for an identical key 
>>>>>>> to
>>>>>>> exist on a higher level.
>>>>>>>
>>>>>>> I apologize that I cannot give you a more useful answer.  2.0 is on
>>>>>>> the horizon.
>>>>>>>
>>>>>>> Matthew
>>>>>>>
>>>>>>>
>>>>>>> On Apr 6, 2014, at 7:04 AM, Edgar Veiga <edgarmve...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Hi again!
>>>>>>>
>>>>>>> Sorry to reopen this discussion, but I have another question
>>>>>>> regarding the former post.
>>>>>>>
>>>>>>> What if, instead of doing a mass deletion (We've already seen that
>>>>>>> it will be non profitable, regarding disk space) I update all the values
>>>>>>> with an empty JSON object "{}" ? Do you see any problem with this? I no
>>>>>>> longer need those millions of values that are living in the cluster...
>>>>>>>
>>>>>>> When the version 2.0 of riak runs stable I'll do the update and only
>>>>>>> then delete those keys!
>>>>>>>
>>>>>>> Best regards
>>>>>>>
>>>>>>>
>>>>>>> On 18 February 2014 16:32, Edgar Veiga <edgarmve...@gmail.com>wrote:
>>>>>>>
>>>>>>>> Ok, thanks a lot Matthew.
>>>>>>>>
>>>>>>>>
>>>>>>>> On 18 February 2014 16:18, Matthew Von-Maszewski <
>>>>>>>> matth...@basho.com> wrote:
>>>>>>>>
>>>>>>>>> Riak 2.0 is coming.  Hold your mass delete until then.  The "bug"
>>>>>>>>> is within Google's original leveldb architecture.  Riak 2.0 sneaks 
>>>>>>>>> around
>>>>>>>>> to get the disk space freed.
>>>>>>>>>
>>>>>>>>> Matthew
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Feb 18, 2014, at 11:10 AM, Edgar Veiga <edgarmve...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> The only/main purpose is to free disk space..
>>>>>>>>>
>>>>>>>>> I was a little bit concerned regarding this operation, but now
>>>>>>>>> with your feedback I'm tending to don't do nothing, I can't risk the
>>>>>>>>> growing of space...
>>>>>>>>> Regarding the overhead I think that with a tight throttling system
>>>>>>>>> I could control and avoid overloading the cluster.
>>>>>>>>>
>>>>>>>>> Mixed feelings :S
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 18 February 2014 15:45, Matthew Von-Maszewski <
>>>>>>>>> matth...@basho.com> wrote:
>>>>>>>>>
>>>>>>>>>> Edgar,
>>>>>>>>>>
>>>>>>>>>> The first "concern" I have is that leveldb's delete does not free
>>>>>>>>>> disk space.  Others have executed mass delete operations only to 
>>>>>>>>>> discover
>>>>>>>>>> they are now using more disk space instead of less.  Here is a 
>>>>>>>>>> discussion
>>>>>>>>>> of the problem:
>>>>>>>>>>
>>>>>>>>>> https://github.com/basho/leveldb/wiki/mv-aggressive-delete
>>>>>>>>>>
>>>>>>>>>> The link also describes Riak's database operation overhead.  This
>>>>>>>>>> is a second "concern".  You will need to carefully throttle your 
>>>>>>>>>> delete
>>>>>>>>>> rate or the overhead will likely impact your production throughput.
>>>>>>>>>>
>>>>>>>>>> We have new code to help quicken the actual purge of deleted data
>>>>>>>>>> in Riak 2.0.  But that release is not quite ready for production 
>>>>>>>>>> usage.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> What do you hope to achieve by the mass delete?
>>>>>>>>>>
>>>>>>>>>> Matthew
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Feb 18, 2014, at 10:29 AM, Edgar Veiga <edgarmve...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Sorry, forgot that info!
>>>>>>>>>>
>>>>>>>>>> It's leveldb.
>>>>>>>>>>
>>>>>>>>>> Best regards
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 18 February 2014 15:27, Matthew Von-Maszewski <
>>>>>>>>>> matth...@basho.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Which Riak backend are you using:  bitcask, leveldb, multi?
>>>>>>>>>>>
>>>>>>>>>>> Matthew
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Feb 18, 2014, at 10:17 AM, Edgar Veiga <edgarmve...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> > Hi all!
>>>>>>>>>>> >
>>>>>>>>>>> > I have a fairly trivial question regarding mass deletion on a
>>>>>>>>>>> riak cluster, but firstly let me give you just some context. My 
>>>>>>>>>>> cluster is
>>>>>>>>>>> running with riak 1.4.6 on 6 machines with a ring of 256 nodes and 
>>>>>>>>>>> 1Tb ssd
>>>>>>>>>>> disks.
>>>>>>>>>>> >
>>>>>>>>>>> > I need to execute a massive object deletion on a bucket, I'm
>>>>>>>>>>> talking of ~1 billion keys (The object average size is ~1Kb). I 
>>>>>>>>>>> will not
>>>>>>>>>>> retrive the keys from riak because a I have a file with all of 
>>>>>>>>>>> them. I'll
>>>>>>>>>>> just start a script that reads them from the file and triggers an 
>>>>>>>>>>> HTTP
>>>>>>>>>>> DELETE for each one.
>>>>>>>>>>> > The cluster will continue running on production with a quite
>>>>>>>>>>> high load serving all other applications, while running this 
>>>>>>>>>>> deletion.
>>>>>>>>>>> >
>>>>>>>>>>> > My question is simple, do I need to have any kind of extra
>>>>>>>>>>> concerns regarding this action? Do you advise me on taking special
>>>>>>>>>>> attention to any kind of metrics regarding riak or event the 
>>>>>>>>>>> servers where
>>>>>>>>>>> it's running?
>>>>>>>>>>> >
>>>>>>>>>>> > Best regards!
>>>>>>>>>>> > _______________________________________________
>>>>>>>>>>> > riak-users mailing list
>>>>>>>>>>> > riak-users@lists.basho.com
>>>>>>>>>>> >
>>>>>>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>>
>

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: RIAK 1.4.6 - Mass key deletion

Reply via email to