I know it's been over two years since this post, and I'm wondering if the
latest version of Riak has made improvements to list keys--I tried the query
with "keys=true" and I didn't seem to have TSA/octomom-related wait times.
I was originally hoping that I could get a list of keys via the RESTful API
which led me to this thread. In other words, a GET url/bucket/key will
indeed return what I shoved into the bucket at that key, but I was hoping
that a GET url/bucket (I guess to be truly RESTful, I should make the bucket
plural) would return the keys.
Thoughts?
Thanks in advance, Chuck
Alexander Sicular wrote
> Hi Thomas,
>
> This is a topic that has come up many times. Lemme just hit a couple of
> high notes in no particular order:
>
> - If you must do a list keys op on a bucket, you must must must use
> "?keys=stream". True will block on the coordinating node until all nodes
> return their keys. Stream will start sending keys as soon as the first
> node returns.
>
> - "list keys" is one of the most expensive native operations you can
> perform in Riak. Not only does it do a full key scan of all the keys in
> your bucket, but all the keys in your cluster. It is obnoxiously expensive
> and only more so as the number of keys in your cluster grows. There has
> been discussions about changing this but everything comes with a cost
> (more open file descriptors) and I do not believe a decision has been made
> yet.
>
> -Riak is in no way a relational system. It is, in fact, about as opposite
> as you can get. Incidentally, "select *" is generally not recommended in
> the Kingdom of Relations and regarded as wasteful. You need a bit of a
> mind shift from relational world to have success with nosql in general and
> Riak in particular.
>
> -There are no native indices in Riak. By default Riak uses the bitcask
> backend. Bitcask has many advantages but one disadvantage is that all keys
> (key length + a bit of overhead) must fit in ram.
>
> -Do not use "?keys=true". Your computer will melt. And then your face.
>
> -As of Riak 0.14 your m/r can filter on key name. I would highly recommend
> that your data architecture take this into account by using keys that have
> meaningful names. This will allow you to not scan every key in your
> cluster.
>
> -Buckets are analogous to relational tables but only just. In Riak, you
> can think of a bucket as a namespace holder (it is used as part of the
> default circular hash function) but primarily as a mechanism to
> differentiate system settings from one group of keys to the next.
>
> -There is no penalty for unlimited buckets except for when their settings
> deviate from the system defaults. By settings I mean things like hooks,
> replication values and backends among others.
>
> -One should list keys by truth if one enjoys sitting in parking lots on
> the freeway on a scorching summers day or perhaps waiting in a TSA line at
> your nearest international point of embarkation surrounded by octomom
> families all the while juggling between the grope or the pr0n slideshow.
> If that is for you, use "?keys=true".
>
> -Virtually everything in Riak is transient. Meaning, for the most part
> (not including the 60 seconds or so of m/r cache), there is no caching
> going on in Riak outside of the operating system. Ie. your subsequent
> queries will do more or less the same work as their predecessors. You need
> to cache your own results if you want to reuse them... quickly.
>
>
>
> Oh, there's more but I'm pretty jelloed from last night. Welcome to the
> fold, Thomas. Can I call you Tom?
>
> Cheers,
> -Alexander Sicular
>
> @siculars
>
> On Jan 22, 2011, at 10:19 AM, Thomas Burdick wrote:
>
>> I've been playing around with riak lately as really my first usage of a
>> distributed key/value store. I quite like many of the concepts and
>> possibilities of Riak and what it may deliver, however I'm really stuck
>> on an issue.
>>
>> Doing the equivalent of a select * from sometable in riak is seemingly
>> slow. As a quick test I tried...
>>
>> http://localhost:8098/riak/mytable?keys=true
>>
>> Before even iterating over the keys this was unbearably slow already.
>> This took almost half a second on my machine where mytable is completely
>> empty!
>>
>> I'm a little baffled, I would assume that getting all the keys of a table
>> is an incredibly common task? How do I get all the keys of a table
>> quickly? By quickly I mean a few milliseconds or less as I would expect
>> of even a "slow" rdbms with an empty table, even some tables with 1000's
>> of items can get all the primary keys of a sql table in a few
>> milliseconds.
>>
>> Tom Burdick
>>
>> ___
>> riak-users mailing list
>>
> riak-users@.basho
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
> ___
> riak-users mailing list
> riak-users@.basho
> http://lists.basho.com/mailman/listinfo/r