Re: RIAK 1.4.6 - Mass key deletion

Edgar Veiga Thu, 10 Apr 2014 06:25:25 -0700

Thanks, I'll start the process and give you guys some feedback in the mean
while.


The plan is
1 - Disable AAE in the cluster via riak attach:


a.
rpc:multicall(riak_kv_entropy_manager, disable, []).
rpc:multicall(riak_kv_entropy_manager, cancel_exchanges, []).
z.

2 - Update the app.config changing the aae dir to the mechanic disk;

3 - Restart riak process in each machine one by one;

4 - Remove old aae data;

By the way, I've seen here in the list different ways of disabling the
aae via riak attach... The former one is the most complete. What does
the a. and z. stand for? I've been disabling the aae just running
"rpc:multicall(riak_kv_entropy_manager, disable, []).", is there any
difference if we ignore the a., z. and the cancel_exchanges?

Best regards!



On 10 April 2014 13:41, Matthew Von-Maszewski <matth...@basho.com> wrote:

> Yes, you can send the AAE (active anti-entropy) data to a different disk.
>
> AAE calculates a hash each time you PUT new data to the regular database.
>  AAE then buffers around 1,000 hashes (I forget the exact value) to write
> as a block to the AAE database.  The AAE write is NOT in series with the
> user database writes.  Your throughput should not be impacted.  But this is
> not something I have personally measured/validated.
>
> Matthew
>
>
> On Apr 10, 2014, at 7:33 AM, Edgar Veiga <edgarmve...@gmail.com> wrote:
>
> Hi Matthew!
>
> I have a possibility of moving the data of anti-entropy directory to a
> mechanic disk 7200, that exists on each of the machines. I was thinking of
> changing the anti_entropy data dir config in app.config file and restart
> the riak process.
>
> Is there any problem using a mechanic disk to store the anti-entropy data?
>
> Best regards!
>
>
> On 8 April 2014 23:58, Edgar Veiga <edgarmve...@gmail.com> wrote:
>
>> I'll wait a few more days, see if the AAE maybe "stabilises" and only
>> after that make a decision regarding this.
>> The cluster expanding was on the roadmap, but not right now :)
>>
>> I've attached a few screenshot, you can clearly observe  the evolution of
>> one of the machines after the anti-entropy data removal and consequent
>> restart  (5th of April).
>>
>> https://cloudup.com/cB0a15lCMeS
>>
>> Best regards!
>>
>>
>> On 8 April 2014 23:44, Matthew Von-Maszewski <matth...@basho.com> wrote:
>>
>>> No.  I do not see a problem with your plan.  But ...
>>>
>>> I would prefer to see you add servers to your cluster.  Scalabilty is
>>> one of Riak's fundamental characteristics.  As your database needs grow, we
>>> grow with you ... just add another server and migrate some of the vnodes
>>> there.
>>>
>>> I obviously cannot speak to your budgetary constraints.  All of the
>>> engineers at Basho, I am just one, are focused upon providing you
>>> performance and features along with your scalability needs.  This seems to
>>> be a situation where you might be sacrificing data integrity where another
>>> server or two would address the situation.
>>>
>>> And if 2.0 makes things better ... sell the extra servers on Ebay.
>>>
>>> Matthew
>>>
>>>
>>> On Apr 8, 2014, at 6:31 PM, Edgar Veiga <edgarmve...@gmail.com> wrote:
>>>
>>> Thanks Matthew!
>>>
>>> Today this situation has become unsustainable, In two of the machines I
>>> have an anti-entropy dir of 250G... It just keeps growing and growing and
>>> I'm almost reaching max size of the disks.
>>>
>>> Maybe I'll just turn off aae in the cluster, remove all the data in the
>>> anti-entropy directory and wait for the v2 of riak. Do you see any problem
>>> with this?
>>>
>>> Best regards!
>>>
>>>
>>> On 8 April 2014 22:11, Matthew Von-Maszewski <matth...@basho.com> wrote:
>>>
>>>> Edgar,
>>>>
>>>> Today we disclosed a new feature for Riak's leveldb, Tiered Storage.
>>>>  The details are here:
>>>>
>>>> https://github.com/basho/leveldb/wiki/mv-tiered-options
>>>>
>>>> This feature might give you another option in managing your storage
>>>> volume.
>>>>
>>>>
>>>> Matthew
>>>>
>>>> On Apr 8, 2014, at 11:07 AM, Edgar Veiga <edgarmve...@gmail.com> wrote:
>>>>
>>>> It makes sense, I do a lot, and I really mean a LOT of updates per key,
>>>> maybe thousands a day! The cluster is experiencing a lot more updates per
>>>> each key, than new keys being inserted.
>>>>
>>>> The hash trees will rebuild during the next weekend (normally it takes
>>>> about two days to complete the operation) so I'll come back and give you
>>>> some feedback (hopefully good) on the next Monday!
>>>>
>>>> Again, thanks a lot, You've been very helpful.
>>>> Edgar
>>>>
>>>>
>>>> On 8 April 2014 15:47, Matthew Von-Maszewski <matth...@basho.com>wrote:
>>>>
>>>>> Edgar,
>>>>>
>>>>> The test I have running currently has reach 1 Billion keys.  It is
>>>>> running against a single node with N=1.  It has 42G of AAE data.  Here is
>>>>> my extrapolation to compare your numbers:
>>>>>
>>>>> You have ~2.5 Billion keys.  I assume you are running N=3 (the
>>>>> default).  AAE therefore is actually tracking ~7.5 Billion keys.  You have
>>>>> six nodes, therefore tracking ~1.25 Billion keys per node.
>>>>>
>>>>> Raw math would suggest that my 42G of AAE data for 1 billion keys
>>>>> would extrapolate to 52.5G of AAE data for you.  Yet you have ~120G of AAE
>>>>> data.  Is something wrong?  No.  My data is still loading and has
>>>>> experience zero key/value updates/edits.
>>>>>
>>>>> AAE hashes get rewritten every time a user updates the value of a key.
>>>>>  AAE's leveldb is just like the user leveldb, all prior values of a key
>>>>> accumulate in the .sst table files until compaction removes duplicates.
>>>>>  Similarly, a user delete of a key causes a delete tombstone in the AAE
>>>>> hash tree.  Those delete tombstones have to await compactions too before
>>>>> leveldb recovers the disk space.
>>>>>
>>>>> AAE's hash trees rebuild weekly.  I am told that the rebuild operation
>>>>> will actually destroy the existing files and start over.  That is when you
>>>>> should see AAE space usage dropping dramatically.
>>>>>
>>>>> Matthew
>>>>>
>>>>>
>>>>> On Apr 8, 2014, at 9:31 AM, Edgar Veiga <edgarmve...@gmail.com> wrote:
>>>>>
>>>>> Thanks a lot Matthew!
>>>>>
>>>>> A little bit of more info, I've gathered a sample of the contents of
>>>>> anti-entropy data of one of my machines:
>>>>> - 44 folders with the name equal to the name of the folders in
>>>>> level-db dir (i.e. 393920363186844927172086927568060657641638068224/)
>>>>> - each folder has a 5 files (log, current, log, etc) and 5 sst_*
>>>>> folders.
>>>>> - The biggest sst folder is sst_3 with 4.3G
>>>>> - Inside sst_3 folder there are 1219 files name 00****.sst.
>>>>> - Each of the 00*****.sst files has ~3.7M
>>>>>
>>>>> Hope this info gives you some more help!
>>>>>
>>>>> Best regards, and again, thanks a lot
>>>>> Edgar
>>>>>
>>>>>
>>>>> On 8 April 2014 13:24, Matthew Von-Maszewski <matth...@basho.com>wrote:
>>>>>
>>>>>> Argh. Missed where you said you had upgraded. Ok it will proceed with
>>>>>> getting you comparison numbers.
>>>>>>
>>>>>> Sent from my iPhone
>>>>>>
>>>>>> On Apr 8, 2014, at 6:51 AM, Edgar Veiga <edgarmve...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Thanks again Matthew, you've been very helpful!
>>>>>>
>>>>>> Maybe you can give me some kind of advise on this issue I'm having
>>>>>> since I've upgraded to 1.4.8.
>>>>>>
>>>>>> Since I've upgraded my anti-entropy data has been growing a lot and
>>>>>> has only stabilised in very high values... Write now my cluster has 6
>>>>>> machines each one with ~120G of anti-entropy data and 600G of level-db
>>>>>> data. This seems to be quite a lot no? My total amount of keys is ~2.5
>>>>>> Billions.
>>>>>>
>>>>>> Best regards,
>>>>>> Edgar
>>>>>>
>>>>>> On 6 April 2014 23:30, Matthew Von-Maszewski <matth...@basho.com>wrote:
>>>>>>
>>>>>>> Edgar,
>>>>>>>
>>>>>>> This is indirectly related to you key deletion discussion.  I made
>>>>>>> changes recently to the aggressive delete code.  The second section of 
>>>>>>> the
>>>>>>> following (updated) web page discusses the adjustments:
>>>>>>>
>>>>>>>     https://github.com/basho/leveldb/wiki/Mv-aggressive-delete
>>>>>>>
>>>>>>> Matthew
>>>>>>>
>>>>>>>
>>>>>>> On Apr 6, 2014, at 4:29 PM, Edgar Veiga <edgarmve...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Matthew, thanks again for the response!
>>>>>>>
>>>>>>> That said, I'll wait again for the 2.0 (and maybe buy some bigger
>>>>>>> disks :)
>>>>>>>
>>>>>>> Best regards
>>>>>>>
>>>>>>>
>>>>>>> On 6 April 2014 15:02, Matthew Von-Maszewski <matth...@basho.com>wrote:
>>>>>>>
>>>>>>>> Edgar,
>>>>>>>>
>>>>>>>> In Riak 1.4, there is no advantage to using empty values versus
>>>>>>>> deleting.
>>>>>>>>
>>>>>>>> leveldb is a "write once" data store.  New data for a given key
>>>>>>>> never physically overwrites old data for the same key.  New data 
>>>>>>>> "hides"
>>>>>>>> the old data by being in a lower level, and therefore picked first.
>>>>>>>>
>>>>>>>> leveldb's compaction operation will remove older key/value pairs
>>>>>>>> only when the newer key/value is pair is part of a compaction involving
>>>>>>>> both new and old.  The new and the old key/value pairs must have 
>>>>>>>> migrated
>>>>>>>> to adjacent levels through normal compaction operations before leveldb 
>>>>>>>> will
>>>>>>>> see them in the same compaction.  The migration could take days, 
>>>>>>>> weeks, or
>>>>>>>> even months depending upon the size of your entire dataset and the 
>>>>>>>> rate of
>>>>>>>> incoming write operations.
>>>>>>>>
>>>>>>>> leveldb's "delete" object is exactly the same as your empty JSON
>>>>>>>> object.  The delete object simply has one more flag set that allows it 
>>>>>>>> to
>>>>>>>> also be removed if and only if there is no chance for an identical key 
>>>>>>>> to
>>>>>>>> exist on a higher level.
>>>>>>>>
>>>>>>>> I apologize that I cannot give you a more useful answer.  2.0 is on
>>>>>>>> the horizon.
>>>>>>>>
>>>>>>>> Matthew
>>>>>>>>
>>>>>>>>
>>>>>>>> On Apr 6, 2014, at 7:04 AM, Edgar Veiga <edgarmve...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi again!
>>>>>>>>
>>>>>>>> Sorry to reopen this discussion, but I have another question
>>>>>>>> regarding the former post.
>>>>>>>>
>>>>>>>> What if, instead of doing a mass deletion (We've already seen that
>>>>>>>> it will be non profitable, regarding disk space) I update all the 
>>>>>>>> values
>>>>>>>> with an empty JSON object "{}" ? Do you see any problem with this? I no
>>>>>>>> longer need those millions of values that are living in the cluster...
>>>>>>>>
>>>>>>>> When the version 2.0 of riak runs stable I'll do the update and
>>>>>>>> only then delete those keys!
>>>>>>>>
>>>>>>>> Best regards
>>>>>>>>
>>>>>>>>
>>>>>>>> On 18 February 2014 16:32, Edgar Veiga <edgarmve...@gmail.com>wrote:
>>>>>>>>
>>>>>>>>> Ok, thanks a lot Matthew.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 18 February 2014 16:18, Matthew Von-Maszewski <
>>>>>>>>> matth...@basho.com> wrote:
>>>>>>>>>
>>>>>>>>>> Riak 2.0 is coming.  Hold your mass delete until then.  The "bug"
>>>>>>>>>> is within Google's original leveldb architecture.  Riak 2.0 sneaks 
>>>>>>>>>> around
>>>>>>>>>> to get the disk space freed.
>>>>>>>>>>
>>>>>>>>>> Matthew
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Feb 18, 2014, at 11:10 AM, Edgar Veiga <edgarmve...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> The only/main purpose is to free disk space..
>>>>>>>>>>
>>>>>>>>>> I was a little bit concerned regarding this operation, but now
>>>>>>>>>> with your feedback I'm tending to don't do nothing, I can't risk the
>>>>>>>>>> growing of space...
>>>>>>>>>> Regarding the overhead I think that with a tight throttling
>>>>>>>>>> system I could control and avoid overloading the cluster.
>>>>>>>>>>
>>>>>>>>>> Mixed feelings :S
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 18 February 2014 15:45, Matthew Von-Maszewski <
>>>>>>>>>> matth...@basho.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Edgar,
>>>>>>>>>>>
>>>>>>>>>>> The first "concern" I have is that leveldb's delete does not
>>>>>>>>>>> free disk space.  Others have executed mass delete operations only 
>>>>>>>>>>> to
>>>>>>>>>>> discover they are now using more disk space instead of less.  Here 
>>>>>>>>>>> is a
>>>>>>>>>>> discussion of the problem:
>>>>>>>>>>>
>>>>>>>>>>> https://github.com/basho/leveldb/wiki/mv-aggressive-delete
>>>>>>>>>>>
>>>>>>>>>>> The link also describes Riak's database operation overhead.
>>>>>>>>>>>  This is a second "concern".  You will need to carefully throttle 
>>>>>>>>>>> your
>>>>>>>>>>> delete rate or the overhead will likely impact your production 
>>>>>>>>>>> throughput.
>>>>>>>>>>>
>>>>>>>>>>> We have new code to help quicken the actual purge of deleted
>>>>>>>>>>> data in Riak 2.0.  But that release is not quite ready for 
>>>>>>>>>>> production usage.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> What do you hope to achieve by the mass delete?
>>>>>>>>>>>
>>>>>>>>>>> Matthew
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Feb 18, 2014, at 10:29 AM, Edgar Veiga <edgarmve...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Sorry, forgot that info!
>>>>>>>>>>>
>>>>>>>>>>> It's leveldb.
>>>>>>>>>>>
>>>>>>>>>>> Best regards
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 18 February 2014 15:27, Matthew Von-Maszewski <
>>>>>>>>>>> matth...@basho.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Which Riak backend are you using:  bitcask, leveldb, multi?
>>>>>>>>>>>>
>>>>>>>>>>>> Matthew
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Feb 18, 2014, at 10:17 AM, Edgar Veiga <
>>>>>>>>>>>> edgarmve...@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> > Hi all!
>>>>>>>>>>>> >
>>>>>>>>>>>> > I have a fairly trivial question regarding mass deletion on a
>>>>>>>>>>>> riak cluster, but firstly let me give you just some context. My 
>>>>>>>>>>>> cluster is
>>>>>>>>>>>> running with riak 1.4.6 on 6 machines with a ring of 256 nodes and 
>>>>>>>>>>>> 1Tb ssd
>>>>>>>>>>>> disks.
>>>>>>>>>>>> >
>>>>>>>>>>>> > I need to execute a massive object deletion on a bucket, I'm
>>>>>>>>>>>> talking of ~1 billion keys (The object average size is ~1Kb). I 
>>>>>>>>>>>> will not
>>>>>>>>>>>> retrive the keys from riak because a I have a file with all of 
>>>>>>>>>>>> them. I'll
>>>>>>>>>>>> just start a script that reads them from the file and triggers an 
>>>>>>>>>>>> HTTP
>>>>>>>>>>>> DELETE for each one.
>>>>>>>>>>>> > The cluster will continue running on production with a quite
>>>>>>>>>>>> high load serving all other applications, while running this 
>>>>>>>>>>>> deletion.
>>>>>>>>>>>> >
>>>>>>>>>>>> > My question is simple, do I need to have any kind of extra
>>>>>>>>>>>> concerns regarding this action? Do you advise me on taking special
>>>>>>>>>>>> attention to any kind of metrics regarding riak or event the 
>>>>>>>>>>>> servers where
>>>>>>>>>>>> it's running?
>>>>>>>>>>>> >
>>>>>>>>>>>> > Best regards!
>>>>>>>>>>>> > _______________________________________________
>>>>>>>>>>>> > riak-users mailing list
>>>>>>>>>>>> > riak-users@lists.basho.com
>>>>>>>>>>>> >
>>>>>>>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>
>

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: RIAK 1.4.6 - Mass key deletion

Reply via email to