Re: map/reduce problem

2011-01-04 Thread Dan Reverri
Can you provide your map/reduce queries in a pastie or gist? Daniel Reverri Developer Advocate Basho Technologies, Inc. d...@basho.com On Tue, Jan 4, 2011 at 8:06 AM, Shai Garniek wrote: > I am running a map/reduce query on riak and I'm getting something weird in > the result. > After 2 map p

map/reduce problem

2011-01-04 Thread Shai Garniek
I am running a map/reduce query on riak and I'm getting something weird in the result. After 2 map phases and one reduce phase I'm getting the following result ["{\"apps\":[\"11\",\"12\"]}","{\"apps\":[\"1\",\"2\",\"3\",\"4\"]}"] when I'm expecting this result. [{"apps":["11","12"]},{"apps":[

Re: map-reduce Problem ?

2010-11-16 Thread Germain Maurice
Le 15/11/10 18:55, Kevin Smith a écrit : innostore is moderately bucket-aware right now so I've forked it (http://github.com/kevsmith/innostore) and added bucket-aware key listing. Based on some very basic testing I'm seeing 2.5x speed up in overall key listing performance compared to the offi

Re: map-reduce Problem ?

2010-11-15 Thread Alexander Sicular
Thanks Dan and Kevin, How about a custom hint file (iirc is the file that is read into mem with all keys) which stems by bucket. And by stem I mean order... with offsets. Redis does all kinds of stuff like this in memory and persists to disk via an append only file. So riak can just grab only s

Re: map-reduce Problem ?

2010-11-15 Thread Kevin Smith
In general, Riak backends combine the bucket name and key into a single value used as the primary key. For the Basho-written backends the combined value is an Erlang tuple of the form {BucketName, Key}. When you list all the keys in a bucket, the backends execute a fold over all their data. The

Re: map-reduce Problem ?

2010-11-15 Thread Dan Reverri
The bucket/key pair is passed around in a 2-tuple: https://github.com/basho/riak_kv/blob/0093af40f8ba97038e98dd04dfea70ef889ff213/src/riak_kv_put_fsm.erl#L84 Each backend can manage the buc

Re: map-reduce Problem ?

2010-11-15 Thread Alexander Sicular
So I get that riak is not bucket aware. When you pass a bucket as an input in an m/r, as riak sifts through all the keys, how does riak isolate bucket specific keys? Are keys stored as /bucket/key internaly and there is a string comparison on split(key,'/') ? Or is there something else going on. T

Re: map-reduce Problem ?

2010-11-15 Thread Kevin Smith
We are giving some thought on how to do that. The main issues wrt to bitcask's key listing performance is that bitcask is not bucket aware and lacks the notion of secondary indices. Not being bucket aware means bitcask has to examine all bucket/key pairs to find the ones related to a given bucke

Re: map-reduce Problem ?

2010-11-15 Thread Greg Steffensen
Along these lines, are there any ideas floating around about how to speed up the listing of keys in a bucket? For the bitcask backend, it seems like an index of keys-by-bucket ought to be the kind of thing that could be stored in the hints files to speed this up without affecting performance for l

Re: map-reduce Problem ?

2010-11-15 Thread Sean Cribbs
This is possible with Riak's MapReduce but you will likely have increasing difficulty as your dataset grows, because of the impact of needing to list keys in a bucket and then eliminate data points you aren't interested in. In the longer term, there will be improvements to MapReduce such that i

map-reduce Problem ?

2010-11-15 Thread Prometheus WillSurvive
Hi , We have a huge database (around 4 billion record - 30 TB) storing the video watch infromation ie view count , comment , favorited etc. I want to produce daily report for all videos view counts. It means I need to look 2 day , today and yesterday so subtract yesterdey view count from today