Re: Getting all the Keys

Jeremiah Peschka Sat, 22 Jan 2011 08:42:29 -0800

I was going to respond, but I think Alex answered it well with much more
humor than I can muster on a good day.


All I can add is:

   - Make sure you're on Riak 0.14.
   - Take a look at the filter
documentation<http://wiki.basho.com/Key-Filters.html> and
   see how you can clean up your queries
   - When you're structuring data think in terms of the queries you'll be
   running. Data duplication is fine.

Riak's design isn't geared towards any kind of complex relational algebra or
range scans - we want to pull singular keys, or a few keys. MapReduce is
more of a batch processing operation.

Also, Alex you're very coherent for a man who was "jelloed" last night.
Bravo.

Jeremiah Peschka
Microsoft SQL Server MVP
MCITP: Database Developer, DBA


On Sat, Jan 22, 2011 at 11:31 AM, Alexander Sicular <sicul...@gmail.com>wrote:

> Hi Thomas,
>
> This is a topic that has come up many times. Lemme just hit a couple of
> high notes in no particular order:
>
> - If you must do a list keys op on a bucket, you must must must use
> "?keys=stream". True will block on the coordinating node until all nodes
> return their keys. Stream will start sending keys as soon as the first node
> returns.
>
> - "list keys" is one of the most expensive native operations you can
> perform in Riak. Not only does it do a full key scan of all the keys in your
> bucket, but all the keys in your cluster. It is obnoxiously expensive and
> only more so as the number of keys in your cluster grows. There has been
> discussions about changing this but everything comes with a cost (more open
> file descriptors) and I do not believe a decision has been made yet.
>
> -Riak is in no way a relational system. It is, in fact, about as opposite
> as you can get. Incidentally, "select *" is generally not recommended in the
> Kingdom of Relations and regarded as wasteful. You need a bit of a mind
> shift from relational world to have success with nosql in general and Riak
> in particular.
>
> -There are no native indices in Riak. By default Riak uses the bitcask
> backend. Bitcask has many advantages but one disadvantage is that all keys
> (key length + a bit of overhead) must fit in ram.
>
> -Do not use "?keys=true". Your computer will melt. And then your face.
>
> -As of Riak 0.14 your m/r can filter on key name. I would highly recommend
> that your data architecture take this into account by using keys that have
> meaningful names. This will allow you to not scan every key in your cluster.
>
> -Buckets are analogous to relational tables but only just. In Riak, you can
> think of a bucket as a namespace holder (it is used as part of the default
> circular hash function) but primarily as a mechanism to differentiate system
> settings from one group of keys to the next.
>
> -There is no penalty for unlimited buckets except for when their settings
> deviate from the system defaults. By settings I mean things like hooks,
> replication values and backends among others.
>
> -One should list keys by truth if one enjoys sitting in parking lots on the
> freeway on a scorching summers day or perhaps waiting in a TSA line at your
> nearest international point of embarkation surrounded by octomom families
> all the while juggling between the grope or the pr0n slideshow. If that is
> for you, use "?keys=true".
>
> -Virtually everything in Riak is transient. Meaning, for the most part (not
> including the 60 seconds or so of m/r cache), there is no caching going on
> in Riak outside of the operating system. Ie. your subsequent queries will do
> more or less the same work as their predecessors. You need to cache your own
> results if you want to reuse them... quickly.
>
>
>
> Oh, there's more but I'm pretty jelloed from last night. Welcome to the
> fold, Thomas. Can I call you Tom?
>
> Cheers,
> -Alexander Sicular
>
> @siculars
>
> On Jan 22, 2011, at 10:19 AM, Thomas Burdick wrote:
>
> > I've been playing around with riak lately as really my first usage of a
> distributed key/value store. I quite like many of the concepts and
> possibilities of Riak and what it may deliver, however I'm really stuck on
> an issue.
> >
> > Doing the equivalent of a select * from sometable in riak is seemingly
> slow. As a quick test I tried...
> >
> > http://localhost:8098/riak/mytable?keys=true
> >
> > Before even iterating over the keys this was unbearably slow already.
> This took almost half a second on my machine where mytable is completely
> empty!
> >
> > I'm a little baffled, I would assume that getting all the keys of a table
> is an incredibly common task?  How do I get all the keys of a table quickly?
> By quickly I mean a few milliseconds or less as I would expect of even a
> "slow" rdbms with an empty table, even some tables with 1000's of items can
> get all the primary keys of a sql table in a few milliseconds.
> >
> > Tom Burdick
> >
> > _______________________________________________
> > riak-users mailing list
> > riak-users@lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Getting all the Keys

Reply via email to