Re: Getting all the Keys

Alexander Staubo Sat, 22 Jan 2011 10:35:03 -0800

On Sat, Jan 22, 2011 at 18:23, Thomas Burdick
<tburd...@wrightwoodtech.com> wrote:
> So really whats the solution to just having a list of like 50k keys that can
> quickly be appended to without taking seconds to then retrieve later on. Or
> is this just not a valid use case for riak at all? That would suck cause
> again, I really like the notion of an AP oriented database!


I have been struggling with the same issue. You may want to look at
Cassandra, which handles sequential key range traversal very well.
Riak also has a problem with buckets sharing the same data storage
(buckets are essentially just a way to namespace keys), so if you have
two buckets and fill up one of them, then enumerating the keys of the
empty bucket will take a long time even though it

> Tom Burdick
>
>
> On Sat, Jan 22, 2011 at 10:31 AM, Alexander Sicular <sicul...@gmail.com>
> wrote:
>>
>> Hi Thomas,
>>
>> This is a topic that has come up many times. Lemme just hit a couple of
>> high notes in no particular order:
>>
>> - If you must do a list keys op on a bucket, you must must must use
>> "?keys=stream". True will block on the coordinating node until all nodes
>> return their keys. Stream will start sending keys as soon as the first node
>> returns.
>>
>> - "list keys" is one of the most expensive native operations you can
>> perform in Riak. Not only does it do a full key scan of all the keys in your
>> bucket, but all the keys in your cluster. It is obnoxiously expensive and
>> only more so as the number of keys in your cluster grows. There has been
>> discussions about changing this but everything comes with a cost (more open
>> file descriptors) and I do not believe a decision has been made yet.
>>
>> -Riak is in no way a relational system. It is, in fact, about as opposite
>> as you can get. Incidentally, "select *" is generally not recommended in the
>> Kingdom of Relations and regarded as wasteful. You need a bit of a mind
>> shift from relational world to have success with nosql in general and Riak
>> in particular.
>>
>> -There are no native indices in Riak. By default Riak uses the bitcask
>> backend. Bitcask has many advantages but one disadvantage is that all keys
>> (key length + a bit of overhead) must fit in ram.
>>
>> -Do not use "?keys=true". Your computer will melt. And then your face.
>>
>> -As of Riak 0.14 your m/r can filter on key name. I would highly recommend
>> that your data architecture take this into account by using keys that have
>> meaningful names. This will allow you to not scan every key in your cluster.
>>
>> -Buckets are analogous to relational tables but only just. In Riak, you
>> can think of a bucket as a namespace holder (it is used as part of the
>> default circular hash function) but primarily as a mechanism to
>> differentiate system settings from one group of keys to the next.
>>
>> -There is no penalty for unlimited buckets except for when their settings
>> deviate from the system defaults. By settings I mean things like hooks,
>> replication values and backends among others.
>>
>> -One should list keys by truth if one enjoys sitting in parking lots on
>> the freeway on a scorching summers day or perhaps waiting in a TSA line at
>> your nearest international point of embarkation surrounded by octomom
>> families all the while juggling between the grope or the pr0n slideshow. If
>> that is for you, use "?keys=true".
>>
>> -Virtually everything in Riak is transient. Meaning, for the most part
>> (not including the 60 seconds or so of m/r cache), there is no caching going
>> on in Riak outside of the operating system. Ie. your subsequent queries will
>> do more or less the same work as their predecessors. You need to cache your
>> own results if you want to reuse them... quickly.
>>
>>
>>
>> Oh, there's more but I'm pretty jelloed from last night. Welcome to the
>> fold, Thomas. Can I call you Tom?
>>
>> Cheers,
>> -Alexander Sicular
>>
>> @siculars
>>
>> On Jan 22, 2011, at 10:19 AM, Thomas Burdick wrote:
>>
>> > I've been playing around with riak lately as really my first usage of a
>> > distributed key/value store. I quite like many of the concepts and
>> > possibilities of Riak and what it may deliver, however I'm really stuck on
>> > an issue.
>> >
>> > Doing the equivalent of a select * from sometable in riak is seemingly
>> > slow. As a quick test I tried...
>> >
>> > http://localhost:8098/riak/mytable?keys=true
>> >
>> > Before even iterating over the keys this was unbearably slow already.
>> > This took almost half a second on my machine where mytable is completely
>> > empty!
>> >
>> > I'm a little baffled, I would assume that getting all the keys of a
>> > table is an incredibly common task?  How do I get all the keys of a table
>> > quickly? By quickly I mean a few milliseconds or less as I would expect of
>> > even a "slow" rdbms with an empty table, even some tables with 1000's of
>> > items can get all the primary keys of a sql table in a few milliseconds.
>> >
>> > Tom Burdick
>> >
>> > _______________________________________________
>> > riak-users mailing list
>> > riak-users@lists.basho.com
>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>
>
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Getting all the Keys

Reply via email to