The best (fastest) way to delete/clear a bucket [python]
Hi, For testing, I'd like to be able to throw a large number of data at Riak (100k+ entries), check how it performed, change something in the application, run the test again. I'd like to use the same data every time, so, I'd like to clear the bucket between every test. The documentation ( http://docs.basho.com/riak/2.0.0beta1/dev/references/http/) says: *Delete Buckets* There is no straightforward way to delete an entire Bucket. To delete all the keys in a bucket, you’ll need to delete them all individually. So, I'm currently using something like: for k in r_bk.get_keys(): v = r_bk.get(k) if v.exists: r_bk.delete(v) The problem is that r_bk.get_keys() returns a lot of elements that don't exist (tombstones?) and iterating over all of them takes time. Is that the way it's supposed to work? Or am I missing something? - I'm using default delete_mode configuration ( 3 seconds ) - I'm using Riak 2.0 alpha 19 with Python. ( there's a bug with strong consistency in Beta1, cannot use it) - changing the bucket name for every run seems .. impractical? Any advices welcomed, -- Thanks, Paweł ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: The best (fastest) way to delete/clear a bucket [python]
The problem is that the tombstones never disappear - they keep coming back through bucket.get_keys() hours after deletion, even after a restart. I said I'm using the delete_mode default configuration, because I didn't change it. I now tried, and apparently it's not supported any more in Riak 2.0. 17:16:56.318 [error] You've tried to set delete_mode, but there is no setting with that name.^M 17:16:56.318 [error] Did you mean one of these?^M 17:16:56.335 [error] dtrace^M 17:16:56.335 [error] nodename^M 17:16:56.335 [error] ssl.keyfile^M 17:16:56.335 [error] Error generating configuration in phase transform_datatypes^M 17:16:56.335 [error] Conf file attempted to set unknown variable: delete_mode^M Error generating config with cuttlefish I'm using Riak 2.0.0pre20, on strongly consistent buckets, on a single node cluster. Can this be the reason? I guess what I need is a confirmation that something is broken/that I'm doing something stupid. I've tried looking for similar issues (github.com/basho/riak/issues), didn't find any -> I guess that suggests I'm doing something stupid, I just don't know what yet. Thanks again :) -- Paweł On 19 May 2014 18:00, Dmitri Zagidulin wrote: > Ah, yes, you bring up a good point. (And, that's another subtlety to keep > in mind, with Option #1). > > Tombstones are definitely something to keep in mind, when deleting unit > test data. > As you mentioned in your earlier question, if you're using default > delete_mode configuration ( 3 seconds ), it means that if you issue a > delete, a tombstone object is going to be written (and stick around for at > least 3 seconds), and unfortunately, it is going to show up as a false > positive on a List Keys call. > > The easiest thing to try, in your case, is to set 'delete_mode' to > 'immediate', restart the test cluster, and retest. With an immediate > delete, your second test with 10 keys should not take as long as the > previous delete with 1 keys. > > > > > On Mon, May 19, 2014 at 11:46 AM, Paweł Królikowski wrote: > >> Hi Dmitri, >> >> Thanks a lot for the answer. Option #1 seems the best, but I have a >> follow up question: >> >> - when do the deleted keys disappear from Riak: a part of my problem >> (have not explained it correctly the first time), is that get_keys() >> returns keys that no longer exist. So, I run a test with 10 000 keys, I >> remove them, it takes Nseconds. I then follow with a test with 10 keys, but >> removing them takes just as much time - I imagine it's because I'm going >> over that 10 000 keys again. >> >> This article seems relevant: >> http://basho.com/riaks-config-behaviors-part-3/ - it seems like the >> tombstones simply remain in my system indefinitely. >> >> -- >> Paweł >> >> >> On 19 May 2014 15:32, Dmitri Zagidulin wrote: >> >>> Hi Pawel, >>> >>> There's basically three ways to clear data from Riak (for the purposes >>> of automated testing): >>> >>> 1. Iterate through the keys via get_keys(), and delete each one. This is >>> what you're currently doing, except you don't need to invoke if.exists(). >>> if.exists() makes an additional API call to Riak, and it takes twice as >>> long as just calling delete() (and trapping a potential 404 doesn't exist >>> error). >>> >>> Advantages: Easy to understand, can be done entirely in code (without >>> invoking OS/shell commands). >>> >>> Disadvantages: It can get slow, for large data sets. Another subtle >>> disadvantage is that, as your app grows, it can get difficult to keep track >>> of which buckets you've created and need to be cleared. >>> >>> 2. Stop the Riak cluster, delete the riak data directory, and re-start. >>> >>> Advantages: Very fast, and you can be sure that you're deleting all >>> buckets. >>> >>> Disadvantages: Involves invoking OS/shell commands. This is fairly easy >>> if your Riak node is running on the same machine as your tests (and if it's >>> a single node). To delete the data directories of a multi-node cluster, now >>> you need to involve either a bash script that uses SSH to log in and >>> restart, or a coordination framework like Ansible. >>> >>> 3. Use an in-memory back end. (And to drop all data, just restart the >>> node(s)). >>> >>> Advantages: Same as #2 - fast, thorough. >>> >>> Disadvantages: Same as #2 (involves shell commands, potentially SSH >>> etc). In ad
Re: The best (fastest) way to delete/clear a bucket [python]
Ok then, I've stopped riak, wiped bitcask and anti_entropy directories, updated config, started riak. I've tried to verify it with: riak config generate -l debug Got output: [...] 10:25:46.260 [info] /etc/riak/advanced.config detected, overlaying proplists -config /var/lib/riak/generated.configs/app.2014.05.20.10.25.46.config -args_file /var/lib/riak/generated.configs/vm.2014.05.20.10.25.46.args -vm_args /var/lib/riak/generated.configs/vm.2014.05.20.10.25.46.args And at the very end of the config file there's: {k_kv,[{delete_mode,immediate}]}]. So, it worked. Then did this: >>> import riak >>> c = riak.RiakClient(pb_port=8087, protocol='pbc', host='db-13') >>> b = c.bucket(name='locate', bucket_type='strongly_consistent') >>> o = b.get('foo') >>> o.data = 3 >>> o.store() >>> o.delete() >>> b.delete('foo') >>> o.exists False >>> b.get_keys() ['foo'] So, it didn't work. It's not just the python client, because if I do this, I get the key back: http://db-13:8098/types/strongly_consistent/buckets/locate/keys?keys=true {"keys":["foo"]} I've tried deleting the key via http request (curl -v -X DELETE http://db-13:8098/types/strongly_consistent/buckets/locate/keys/bar), but it still remains. http://db-13:8098/types/strongly_consistent/buckets/locate/keys/foo returns not found but http://db-13:8098/types/strongly_consistent/buckets/locate/keys?keys=true gives {"keys":["foo","bar"]} I've tried looking for detailed logs, but console.log, even on debug, doesn't print anything useful. I've also tried looking inside bitcask directory, and there's definitely 'some' binary data there, even after deletion. On 19 May 2014 23:23, Dmitri Zagidulin wrote: > Ah, that's interesting, let's see if we can test this. > > The 'delete_mode' configuration is not supported in the regular riak.conf > file, from what I understand. > However, you can still set it in the 'advanced.config' file, as described > here: > > https://github.com/basho/basho_docs/blob/features/lp/advanced-conf/source/languages/en/riak/ops/advanced/configs/configuration-files.md#the-advancedconfig-file > (those docs are a current work-in-progress, mind you) > > So, create an advanced.config file in your riak etc/ directory (this will > be in addition to your existing riak.conf), with the following contents: > [ > {riak_kv, [ >{delete_mode, immediate} > ]} > ]. > > Restart the node, and try your tests again. The tombstones should > disappear now on every delete request. (You should probably also wipe all > of the old data, by deleting the contents of the bitcask and anti_entropy > directories in your riak data dir, just to make sure the old ones are gone. > This should be done while the node is down, of course.) > > > > On Mon, May 19, 2014 at 4:33 PM, Paweł Królikowski wrote: > >> The problem is that the tombstones never disappear - they keep coming >> back through bucket.get_keys() hours after deletion, even after a restart. >> >> I said I'm using the delete_mode default configuration, because I didn't >> change it. I now tried, and apparently it's not supported any more in Riak >> 2.0. >> >> 17:16:56.318 [error] You've tried to set delete_mode, but there is no >> setting with that name.^M >> 17:16:56.318 [error] Did you mean one of these?^M >> 17:16:56.335 [error] dtrace^M >> 17:16:56.335 [error] nodename^M >> 17:16:56.335 [error] ssl.keyfile^M >> 17:16:56.335 [error] Error generating configuration in phase >> transform_datatypes^M >> 17:16:56.335 [error] Conf file attempted to set unknown variable: >> delete_mode^M >> Error generating config with cuttlefish >> >> I'm using Riak 2.0.0pre20, on strongly consistent buckets, on a single >> node cluster. Can this be the reason? I guess what I need is a confirmation >> that something is broken/that I'm doing something stupid. >> >> I've tried looking for similar issues (github.com/basho/riak/issues), >> didn't find any -> I guess that suggests I'm doing something stupid, I just >> don't know what yet. >> >> >> Thanks again :) >> >> -- >> Paweł >> >> >> On 19 May 2014 18:00, Dmitri Zagidulin wrote: >> >>> Ah, yes, you bring up a good point. (And, that's another subtlety to >>> keep in mind, with Option #1). >>> >>> Tombstones are definitely something to keep in mind, when deleting
Re: The best (fastest) way to delete/clear a bucket [python]
@Dmitri - cool, thanks. Now that I know it's an expected behaviour, even if I think it's strange, I can find a way of working around it :) @Sean - tbh, I don't know. I was trying to test a whole application, involving http requests + multiple consumers over rabbitmq with semi-real data, so random bucket/key names sound .. wrong (&compliated?). On the other hand, restarting riak & nuking data directory, possibly on mutli-node cluster, doesn't seem that much better. I'll play with tests a little longer, I'll come up with something that works. Anyway, thanks for the help :) On 20 May 2014 15:50, Sean Cribbs wrote: > For what it's worth, in the integration tests of our client libraries we > have moved to generating random bucket and key names for each test/example. > This reduces setup/teardown time and is less susceptible to the types of > unexpected behaviors you are seeing from list-keys. If possible, I highly > recommend this approach in your suite. > > > On Tue, May 20, 2014 at 9:25 AM, Dmitri Zagidulin wrote: > >> Ok, so, from what I understand, this is going to be expected behavior >> from strongly consistent buckets. (I'm in the process of confirming this, >> and we'll see if we can add it to the documentation). The delete_mode: >> immediate is ignored, and the tombstone is kept around, to ensure the >> consistency of not found, etc. (In the context of further over-writes of >> that key). >> >> So, unfortunately that may be bad news in terms of deleting a >> stongly_consistent bucket via keylist for unit testing. :) >> >> You may want to switch to method #2, for your test suite. (Write a shell >> script to stop the node, delete the bitcask & aae dirs, and restart. And >> invoke it as a shell script command from your test suite. Or just call >> those commands directly.). >> >> >> >> On Tue, May 20, 2014 at 5:44 AM, Paweł Królikowski wrote: >> >>> Ok then, >>> >>> I've stopped riak, wiped bitcask and anti_entropy directories, updated >>> config, started riak. >>> >>> I've tried to verify it with: >>> >>> riak config generate -l debug >>> >>> Got output: >>> >>> [...] >>> >>> 10:25:46.260 [info] /etc/riak/advanced.config detected, overlaying >>> proplists >>> -config /var/lib/riak/generated.configs/app.2014.05.20.10.25.46.config >>> -args_file /var/lib/riak/generated.configs/vm.2014.05.20.10.25.46.args >>> -vm_args /var/lib/riak/generated.configs/vm.2014.05.20.10.25.46.args >>> >>> >>> And at the very end of the config file there's: >>> >>> {k_kv,[{delete_mode,immediate}]}]. >>> >>> So, it worked. >>> >>> >>> Then did this: >>> >>> >>> import riak >>> >>> c = riak.RiakClient(pb_port=8087, protocol='pbc', host='db-13') >>> >>> b = c.bucket(name='locate', bucket_type='strongly_consistent') >>> >>> o = b.get('foo') >>> >>> o.data = 3 >>> >>> o.store() >>> >>> >>> o.delete() >>> >>> >>> b.delete('foo') >>> >>> >>> o.exists >>> False >>> >>> b.get_keys() >>> ['foo'] >>> >>> >>> So, it didn't work. >>> >>> It's not just the python client, because if I do this, I get the key >>> back: >>> >>> http://db-13:8098/types/strongly_consistent/buckets/locate/keys?keys=true >>> {"keys":["foo"]} >>> >>> >>> >>> I've tried deleting the key via http request (curl -v -X DELETE >>> http://db-13:8098/types/strongly_consistent/buckets/locate/keys/bar), >>> but it still remains. >>> >>> http://db-13:8098/types/strongly_consistent/buckets/locate/keys/foo >>> >>> returns >>> >>> not found >>> >>> but >>> >>> http://db-13:8098/types/strongly_consistent/buckets/locate/keys?keys=true >>> >>> gives >>> >>> {"keys":["foo","bar"]} >>> >>> >>> I've tried looking for detailed logs, but console.log, even on debug, >>> doesn't print anything useful. >>> I've also tried looking inside bitcask directory, and there's definitely >>> 'some' binary data there, even after
Re: Question on "link walking: Deprecation Notice"
Hi, I'm sorry for resurrecting an old thread, but .. well, I'm basically asking the same question as Alexander - will there be a way of link walking in future Riaks? We're currently looking into ways of moving some of our data from Oracle into any nosql database. Vast majority of data will be structured into 4 tier ... structure - level 1 contains multiple level 2 elements, level 2 contains multiple level 3 elements. etc. The objects will not be big, so we're fine with storing them as a single document keyed by level-1 object's id - data would be used primarily for web lookup/display. However, there's also a different group of documents that logically references objects from the hierarchical data, at any level - it's basically a grouping of a subset of above elements. Usually a group should contain around 20-100 elements, 500 being very, very rare case. Since an element can be in multiple groups at the same time denormalization seemed to expensive. Link walking seemed a like a good idea - a single query would return all the required data. Alternatively, we were thinking about having a separate data store, possibly in memory one, and then using a multi-get (python client seem to be support it. http://basho.github.io/riak-python-client/bucket.html), but we're not sure about performance penalty. Any suggestions will be welcomed, we're not bound to any technology atm, Riak is just a possible option. -- Paweł On 13 May 2014 13:26, Alexander Grytsenko wrote: > BTW - i'm using python riak-python-client (v2.0.3) and it allows you to > use link walking and chaining but returns back a bucket/key/tag tuple that > allows to fetch a needed object via the secondary request. > > Will this also be deprecated? > > > > > On Tue, May 13, 2014 at 3:19 PM, Alexander Grytsenko < > alexander.grytse...@dev-pro.net> wrote: > >> >> Hi all, >> >> I'm a bit confused of this deprecation notice that is on the official >> docs.basho page that is related to a link walking explanation. See here: >> http://docs.basho.com/riak/2.0.0beta1/dev/using/link-walking/ >> >> Basically, It says that starting from v2.0 the Link Walking feature is >> marked as deprecated and it is going to be removed. >> >> >> As I understand link walking allows you to "fetch a related object" by an >> url like: >> >> >> curl -v http://127.0.0.1:8091/riak/people/timoreilly/people,friend,1 >> >... >> >> < {'name': 'dave'} # another object with a key=dave who appears to be a >> friend of tim >> >> Clearly this functionality will be removed. >> >> But does it mean the 'Links' (as key's metadata) will also be deprecated >> and removed? >> Or will I be able to store the Links into object's medatada in v2.0 or >> 3.0 or whatever..? >> Or is it going to be possible to travel over links via map/reduce in the >> future versions? >> >> Should I think now about storing additional info somewhere inside my >> object to keep simple relations in the future? >> (and convert all my current links to object's inner data) >> >> Or the deprecation warning above is related only to "Link Walking" (on >> the server side with getting the object back), but not to the "Links" >> itself? >> >> >> Thanks for keeping this thread, >> alex >> >> >> > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com