Re: Getting all the Keys

Neville Burnell Sat, 22 Jan 2011 14:44:07 -0800

>As of Riak 0.14 your m/r can filter on key name. I would highly recommend
that your data architecture take this into account by using keys that have
meaningful names.


>>>>>>This will allow you to not scan every key in your cluster.
Is this part true?

I understood that key filtering just means you dont have to fetch the
'value' from the backend (bitcask or innostore). How would it help wrt to
scanning every key? Without a 'secondary index/set' somewhere, you would
still need to scan every key in the cluster to find all the keys that match
your filter.

Kind Regards

Nev

On 23 January 2011 03:31, Alexander Sicular <sicul...@gmail.com> wrote:

> Hi Thomas,
>
> This is a topic that has come up many times. Lemme just hit a couple of
> high notes in no particular order:
>
> - If you must do a list keys op on a bucket, you must must must use
> "?keys=stream". True will block on the coordinating node until all nodes
> return their keys. Stream will start sending keys as soon as the first node
> returns.
>
> - "list keys" is one of the most expensive native operations you can
> perform in Riak. Not only does it do a full key scan of all the keys in your
> bucket, but all the keys in your cluster. It is obnoxiously expensive and
> only more so as the number of keys in your cluster grows. There has been
> discussions about changing this but everything comes with a cost (more open
> file descriptors) and I do not believe a decision has been made yet.
>
> -Riak is in no way a relational system. It is, in fact, about as opposite
> as you can get. Incidentally, "select *" is generally not recommended in the
> Kingdom of Relations and regarded as wasteful. You need a bit of a mind
> shift from relational world to have success with nosql in general and Riak
> in particular.
>
> -There are no native indices in Riak. By default Riak uses the bitcask
> backend. Bitcask has many advantages but one disadvantage is that all keys
> (key length + a bit of overhead) must fit in ram.
>
> -Do not use "?keys=true". Your computer will melt. And then your face.
>
> -As of Riak 0.14 your m/r can filter on key name. I would highly recommend
> that your data architecture take this into account by using keys that have
> meaningful names. This will allow you to not scan every key in your cluster.
>
> -Buckets are analogous to relational tables but only just. In Riak, you can
> think of a bucket as a namespace holder (it is used as part of the default
> circular hash function) but primarily as a mechanism to differentiate system
> settings from one group of keys to the next.
>
> -There is no penalty for unlimited buckets except for when their settings
> deviate from the system defaults. By settings I mean things like hooks,
> replication values and backends among others.
>
> -One should list keys by truth if one enjoys sitting in parking lots on the
> freeway on a scorching summers day or perhaps waiting in a TSA line at your
> nearest international point of embarkation surrounded by octomom families
> all the while juggling between the grope or the pr0n slideshow. If that is
> for you, use "?keys=true".
>
> -Virtually everything in Riak is transient. Meaning, for the most part (not
> including the 60 seconds or so of m/r cache), there is no caching going on
> in Riak outside of the operating system. Ie. your subsequent queries will do
> more or less the same work as their predecessors. You need to cache your own
> results if you want to reuse them... quickly.
>
>
>
> Oh, there's more but I'm pretty jelloed from last night. Welcome to the
> fold, Thomas. Can I call you Tom?
>
> Cheers,
> -Alexander Sicular
>
> @siculars
>
> On Jan 22, 2011, at 10:19 AM, Thomas Burdick wrote:
>
> > I've been playing around with riak lately as really my first usage of a
> distributed key/value store. I quite like many of the concepts and
> possibilities of Riak and what it may deliver, however I'm really stuck on
> an issue.
> >
> > Doing the equivalent of a select * from sometable in riak is seemingly
> slow. As a quick test I tried...
> >
> > http://localhost:8098/riak/mytable?keys=true
> >
> > Before even iterating over the keys this was unbearably slow already.
> This took almost half a second on my machine where mytable is completely
> empty!
> >
> > I'm a little baffled, I would assume that getting all the keys of a table
> is an incredibly common task?  How do I get all the keys of a table quickly?
> By quickly I mean a few milliseconds or less as I would expect of even a
> "slow" rdbms with an empty table, even some tables with 1000's of items can
> get all the primary keys of a sql table in a few milliseconds.
> >
> > Tom Burdick
> >
> > _______________________________________________
> > riak-users mailing list
> > riak-users@lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Getting all the Keys

Reply via email to