> The word everywhere is to avoid key filters. It effectively does a 
> whole-bucket key-listing, and that starts to get seriously slow out past 100k 
> items. Since you say test queries work I'll presume you've debugged your map 
> and reduce on some queries where you manually add a set of keys. (Right?)

Just as a note, using the Erlang pb client you can use the key filters
for 2i queries if you include the riak_kv_mapred_filters module in
your client code path.

➜  riak-erlang-client git:(master) ✗ erl -pa ebin -pa deps/*/ebin -pa
~/src/riak/deps/riak_kv/ebin
Erlang R15B01 (erts-5.9.1) [source] [64-bit] [smp:2:2]
[async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.9.1  (abort with ^G)
1> O1 = riakc_obj:new(<<"test">>, <<"abc/def/1">>, []),
1> O2 = riakc_obj:new(<<"test">>, <<"abc/def/2">>, []),
1> O3 = riakc_obj:new(<<"test">>, <<"hij/klm/1">>, []),
1> {ok, Pid} = riakc_pb_socket:start_link("127.0.0.1", 8087),
1> riakc_pb_socket:put(Pid, O1),riakc_pb_socket:put(Pid, O2),
riakc_pb_socket:put(Pid, O3).
ok
2> Index = {index, <<"test">>, <<"$key">>, <<0>>, <<255>>},
2> {ok, Filter} = riak_kv_mapred_filters:build_filter([[<<"ends_with">>,"1"]]),
2> MR = [
2>   { reduce
2>   , {qfun, fun(X, F) -> lists:filter(fun({A, B}) -> F(B) end, X) end}
2>   , riak_kv_mapred_filters:compose(Filter)
2>   , true}],
2> riakc_pb_socket:mapred(Pid, Index, MR).
{ok,[{0,
      [{<<"test">>,<<"hij/klm/1">>},
       {<<"test">>,<<"abc/def/1">>}]}]}

Olav

2012/10/14, Adam Lindsay <a...@alum.mit.edu>:
- Vis sitert tekst -
>> riak-users@lists.basho.com (mailto:riak-users@lists.basho.com)
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>
>

2012/10/14, Adam Lindsay <a...@alum.mit.edu>:
> Hi David,
>
> The word everywhere is to avoid key filters. It effectively does a
> whole-bucket key-listing, and that starts to get seriously slow out past
> 100k items. Since you say test queries work I'll presume you've debugged
> your map and reduce on some queries where you manually add a set of keys.
> (Right?)
>
> Since you're on LevelDB, it means you can use secondary indices ("2i") to
> drive these queries.
>
> I don't have access to your filter_map, so I don't have access to how you
> construct your keys, but if you have 2i turned on, then you get the first
> key-field "for free" from 2i.
>
> Let's say, hypothetically, that your keys are constructed as:
>  keyprefix:<date>:<country>:<campaign_id>
>
> Well, you can then rewrite the query input as:
>
> def main():
>     client = riak.RiakClient(host=riak_host,
>         port=8087,transport_class=riak.transports.pbc.RiakPbcTransport)
>     query = client.index(
>                     bucket,
>                     '$key',
>                     'keyprefix:201210',
>                     'keyprefix:201210~')
>     query.map('''function(value, keyData, arg) { ... }''')
>     …
>
>
>
> That's fine as far as it goes, but it doesn't solve the problem of querying
> country or campaign id, right?
>
> As a temporary measure, I'd suggest trying your key filters, cranking up the
> timeout to something on the order of hours (I gave 5 minutes conservatively
> and arbitrarily), and going ahead and running it for however long it takes.
>
>
> If those queries do give good results, I'd suggest going ahead and
> re-indexing your existing entries with 'country_bin' and 'campaign_bin'.
> It's up to personal style whether you treat dates as int or bin.
>
> There are lots of tricks and further discussion on how best to get at every
> corner of your data, but how does this strike you so far?
> --
> Adam Lindsay
>
>
> On Sunday, 14 October 2012 at 12:57, David Montgomery wrote:
>
>> Hi,
>>
>> Below is my code for running a map reduce in python. I have a six
>> node cluster, 2 cores each with 4 gigs for ram. I am no load and
>> about 3 Mill keys and using leveldb with riak 1.2. Doing the below
>> is taking a terribly long time. Never finished and I dont even know
>> how I can check if it is even running other than the python script has
>> not timed out. I look at the number of executed mappers in stats and
>> it is flat lined when looking at Graphite. On test queries the below
>> works.
>>
>> So..how do I debug what is going on?
>>
>>
>> def main():
>> client =
>> riak.RiakClient(host=riak_host,port=8087,transport_class=riak.transports.pbc.RiakPbcTransport)
>> query = client.add(bucket)
>> filters = key_filter.tokenize(":", filter_map['date']) +
>> (key_filter.starts_with('201210'))
>> #& key_filter.tokenize(":", filter_map['country']).eq("US") \
>> #& key_filter.tokenize(":", filter_map['campaign_id']).eq("t1") \
>> query.add_key_filters(filters)
>>
>> query.map('''
>> function(value, keyData, arg) {
>> var data = Riak.mapValuesJson(value)[0];
>>
>> if(data['adx']=='gdn'){
>> var alt_key = data['hw'];
>> var obj = {};
>> obj[alt_key] = 1;
>> return [ obj ];
>> }else{
>> return [];
>> }
>>
>>
>> }''')
>>
>>
>> query.reduce('''
>> function(values, arg){
>> return [ values.reduce( function(acc, item) {
>> for (var state in item) {
>> if (acc[state])
>> acc[state] += item[state];
>> else
>> acc[state] = item[state];
>> }
>> return acc;
>> })];
>> }
>> ''')
>>
>> for result in query.run(timeout=300000):
>> print result
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users@lists.basho.com (mailto:riak-users@lists.basho.com)
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>
>
>

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to