Re: Getting all the Keys

Jeremiah Peschka Sat, 22 Jan 2011 10:30:45 -0800

If you're looking for a fast, in memory, store that has support for ordered
lists you should probably give Redis a look-see. It's an in memory key-value
store but it has support for lists as a native data type:
http://redis.io/commands#list You could do the same thing in Riak, but you'd
be storing your list as the value, retrieving it by key, serializing it,
adding the new item to the list, and then persisting it back to the
database. I say this assuming that you want a true list and not just some
messy unordered list of values. Needless to say, that approach is not
optimal.


The key filtering approach that Alex mentioned is an in memory filter. You
don't necessarily have to provide a reduce phase. For example, if you have a
bucket that contains stock information and the key is something like
'YYYY-MM-DD-ticker' you could use a key-filter to get all the keys for 2010
or combine multiple key filters and get all of the keys for 2010 and MSFT
(there's no need, the stock price has been flat for 11 years).

I did some quick analysis, I'm not sure why it happens (apart from the time
needed to fill a buffer), but here are the results I saw when using a list
keys without streaming, with streaming, and listing keys on an empty
bucket.

                  user     system      total        real
list_keys     0.020000   0.000000   0.020000 (  3.254437)
stream_keys   0.030000   0.010000   0.040000 (  0.561119)
empty bucket  0.000000   0.000000   0.000000 (  0.664574)


My test may be completely flawed. Just so you can check it out, here's the
Ruby code I used.

require 'benchmark'
require 'riak'

c = Riak::Client.new(:port => 8091, :http_backend => :Excon)
b = c.bucket('stocks')
fake_bucket = c.bucket('asdfasdfasdf')

Benchmark.bm(7) do |x|
  x.report('list_keys') {
    keys = b.keys
  }

  x.report('stream_keys') {
    b.keys do |list|
      keys = list
    end
  }

  x.report('empty bucket') {
    keys = fake_bucket.keys
  }
end

Jeremiah Peschka
Microsoft SQL Server MVP
MCITP: Database Developer, DBA


On Sat, Jan 22, 2011 at 12:23 PM, Thomas Burdick <
tburd...@wrightwoodtech.com> wrote:

> I guess I'm left even more baffled now, if the keys are all in memory and I
> only have 1 real node in my cluster, why would it take half a second to
> obtain all the keys from a completely empty database? If it takes half a
> second to just list the keys out like that how could a map/reduce ever take
> less time? Doesn't map/reduce need to go through all the keys? Does
> streaming the keys really improve the ability to go through all of them or
> does it just let you incrementally work with them?
>
> There's no real seemingly obvious way to map meaningful names in this case,
> the keys are just random unique identifiers, in postgresql I'd be using the
> serial type which clearly would never work in the case of riak.
>
> So in case of riak I've been using uuid's thus far. So far in order to get
> any sort of meaningful speed I just serialize my own erlang list of binary
> uuid's to a table. That really isn't that fast either though, it just
> happens to be faster than list_keys at the moment.
>
> So really whats the solution to just having a list of like 50k keys that
> can quickly be appended to without taking seconds to then retrieve later on.
> Or is this just not a valid use case for riak at all? That would suck cause
> again, I really like the notion of an AP oriented database!
>
> Tom Burdick
>
>
>
> On Sat, Jan 22, 2011 at 10:31 AM, Alexander Sicular <sicul...@gmail.com>wrote:
>
>> Hi Thomas,
>>
>> This is a topic that has come up many times. Lemme just hit a couple of
>> high notes in no particular order:
>>
>> - If you must do a list keys op on a bucket, you must must must use
>> "?keys=stream". True will block on the coordinating node until all nodes
>> return their keys. Stream will start sending keys as soon as the first node
>> returns.
>>
>> - "list keys" is one of the most expensive native operations you can
>> perform in Riak. Not only does it do a full key scan of all the keys in your
>> bucket, but all the keys in your cluster. It is obnoxiously expensive and
>> only more so as the number of keys in your cluster grows. There has been
>> discussions about changing this but everything comes with a cost (more open
>> file descriptors) and I do not believe a decision has been made yet.
>>
>> -Riak is in no way a relational system. It is, in fact, about as opposite
>> as you can get. Incidentally, "select *" is generally not recommended in the
>> Kingdom of Relations and regarded as wasteful. You need a bit of a mind
>> shift from relational world to have success with nosql in general and Riak
>> in particular.
>>
>> -There are no native indices in Riak. By default Riak uses the bitcask
>> backend. Bitcask has many advantages but one disadvantage is that all keys
>> (key length + a bit of overhead) must fit in ram.
>>
>> -Do not use "?keys=true". Your computer will melt. And then your face.
>>
>> -As of Riak 0.14 your m/r can filter on key name. I would highly recommend
>> that your data architecture take this into account by using keys that have
>> meaningful names. This will allow you to not scan every key in your cluster.
>>
>> -Buckets are analogous to relational tables but only just. In Riak, you
>> can think of a bucket as a namespace holder (it is used as part of the
>> default circular hash function) but primarily as a mechanism to
>> differentiate system settings from one group of keys to the next.
>>
>> -There is no penalty for unlimited buckets except for when their settings
>> deviate from the system defaults. By settings I mean things like hooks,
>> replication values and backends among others.
>>
>> -One should list keys by truth if one enjoys sitting in parking lots on
>> the freeway on a scorching summers day or perhaps waiting in a TSA line at
>> your nearest international point of embarkation surrounded by octomom
>> families all the while juggling between the grope or the pr0n slideshow. If
>> that is for you, use "?keys=true".
>>
>> -Virtually everything in Riak is transient. Meaning, for the most part
>> (not including the 60 seconds or so of m/r cache), there is no caching going
>> on in Riak outside of the operating system. Ie. your subsequent queries will
>> do more or less the same work as their predecessors. You need to cache your
>> own results if you want to reuse them... quickly.
>>
>>
>>
>> Oh, there's more but I'm pretty jelloed from last night. Welcome to the
>> fold, Thomas. Can I call you Tom?
>>
>> Cheers,
>> -Alexander Sicular
>>
>> @siculars
>>
>> On Jan 22, 2011, at 10:19 AM, Thomas Burdick wrote:
>>
>> > I've been playing around with riak lately as really my first usage of a
>> distributed key/value store. I quite like many of the concepts and
>> possibilities of Riak and what it may deliver, however I'm really stuck on
>> an issue.
>> >
>> > Doing the equivalent of a select * from sometable in riak is seemingly
>> slow. As a quick test I tried...
>> >
>> > http://localhost:8098/riak/mytable?keys=true
>> >
>> > Before even iterating over the keys this was unbearably slow already.
>> This took almost half a second on my machine where mytable is completely
>> empty!
>> >
>> > I'm a little baffled, I would assume that getting all the keys of a
>> table is an incredibly common task?  How do I get all the keys of a table
>> quickly? By quickly I mean a few milliseconds or less as I would expect of
>> even a "slow" rdbms with an empty table, even some tables with 1000's of
>> items can get all the primary keys of a sql table in a few milliseconds.
>> >
>> > Tom Burdick
>> >
>> > _______________________________________________
>> > riak-users mailing list
>> > riak-users@lists.basho.com
>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Getting all the Keys

Reply via email to