>As of Riak 0.14 your m/r can filter on key name. I would highly recommend that your data architecture take this into account by using keys that have meaningful names.
>>>>>>This will allow you to not scan every key in your cluster. Is this part true? I understood that key filtering just means you dont have to fetch the 'value' from the backend (bitcask or innostore). How would it help wrt to scanning every key? Without a 'secondary index/set' somewhere, you would still need to scan every key in the cluster to find all the keys that match your filter. Kind Regards Nev On 23 January 2011 03:31, Alexander Sicular <sicul...@gmail.com> wrote: > Hi Thomas, > > This is a topic that has come up many times. Lemme just hit a couple of > high notes in no particular order: > > - If you must do a list keys op on a bucket, you must must must use > "?keys=stream". True will block on the coordinating node until all nodes > return their keys. Stream will start sending keys as soon as the first node > returns. > > - "list keys" is one of the most expensive native operations you can > perform in Riak. Not only does it do a full key scan of all the keys in your > bucket, but all the keys in your cluster. It is obnoxiously expensive and > only more so as the number of keys in your cluster grows. There has been > discussions about changing this but everything comes with a cost (more open > file descriptors) and I do not believe a decision has been made yet. > > -Riak is in no way a relational system. It is, in fact, about as opposite > as you can get. Incidentally, "select *" is generally not recommended in the > Kingdom of Relations and regarded as wasteful. You need a bit of a mind > shift from relational world to have success with nosql in general and Riak > in particular. > > -There are no native indices in Riak. By default Riak uses the bitcask > backend. Bitcask has many advantages but one disadvantage is that all keys > (key length + a bit of overhead) must fit in ram. > > -Do not use "?keys=true". Your computer will melt. And then your face. > > -As of Riak 0.14 your m/r can filter on key name. I would highly recommend > that your data architecture take this into account by using keys that have > meaningful names. This will allow you to not scan every key in your cluster. > > -Buckets are analogous to relational tables but only just. In Riak, you can > think of a bucket as a namespace holder (it is used as part of the default > circular hash function) but primarily as a mechanism to differentiate system > settings from one group of keys to the next. > > -There is no penalty for unlimited buckets except for when their settings > deviate from the system defaults. By settings I mean things like hooks, > replication values and backends among others. > > -One should list keys by truth if one enjoys sitting in parking lots on the > freeway on a scorching summers day or perhaps waiting in a TSA line at your > nearest international point of embarkation surrounded by octomom families > all the while juggling between the grope or the pr0n slideshow. If that is > for you, use "?keys=true". > > -Virtually everything in Riak is transient. Meaning, for the most part (not > including the 60 seconds or so of m/r cache), there is no caching going on > in Riak outside of the operating system. Ie. your subsequent queries will do > more or less the same work as their predecessors. You need to cache your own > results if you want to reuse them... quickly. > > > > Oh, there's more but I'm pretty jelloed from last night. Welcome to the > fold, Thomas. Can I call you Tom? > > Cheers, > -Alexander Sicular > > @siculars > > On Jan 22, 2011, at 10:19 AM, Thomas Burdick wrote: > > > I've been playing around with riak lately as really my first usage of a > distributed key/value store. I quite like many of the concepts and > possibilities of Riak and what it may deliver, however I'm really stuck on > an issue. > > > > Doing the equivalent of a select * from sometable in riak is seemingly > slow. As a quick test I tried... > > > > http://localhost:8098/riak/mytable?keys=true > > > > Before even iterating over the keys this was unbearably slow already. > This took almost half a second on my machine where mytable is completely > empty! > > > > I'm a little baffled, I would assume that getting all the keys of a table > is an incredibly common task? How do I get all the keys of a table quickly? > By quickly I mean a few milliseconds or less as I would expect of even a > "slow" rdbms with an empty table, even some tables with 1000's of items can > get all the primary keys of a sql table in a few milliseconds. > > > > Tom Burdick > > > > _______________________________________________ > > riak-users mailing list > > riak-users@lists.basho.com > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com