Riak Users, One aspect of Riak's interface that has often been discouraged in the past is the listing of all keys in a bucket. This has been for two reasons: the first is that it is necessarily an operation that is more heavyweight than any of the more targeted get/put/delete sorts of things, but the second is that due to the priorities of the first many users of Riak we hadn't really put much optimization into that area. As a result, anything that required getting all keys from a bucket was fairly slow and also fairly heavy in terms of memory consumption.
We have put some effort into this recently and seen marked improvement. The changes can be summed up as: 1- bitcask has a new fold_keys operation, which performs far less I/O in most cases than the previous mechanism underlying list_keys. 2- the Riak backend interface to bitcask uses the new fold_keys operation. 3- the mechanism underlying the cluster-wide list_keys operation has changed to require far less total memory in proportion to the list. Due to these three changes, there are two effective results: 1- In nearly all cases, the list_keys operator is much faster than before. In some common cases it is 10 times faster. 2- In cases of very large buckets, memory allocation will not spike during key listing. (though of course if you ask Riak to build the whole list for you instead of streaming it out, then at least that much must be used to accommodate) Note that since map/reduce uses the streaming list_keys under the hood when performing map/reduce over a whole bucket, these changes affect that interface's performance as well. The described changes are now in the trunks of the relevant repositories, and will be included in the next release. -Justin _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com