MapReduce filtering question

2010-11-19 Thread Parker Thompson
I'm experimenting with Riak by trying to port a simple a/b testing framework
that's currently SQL backed. Since I'm using Ripple/riak-client my code
below are in Ruby/JS.

The domain model is fairly simple. I have visitors, which get created for
any user who hits the site, visitors see alternatives (currently these are
ActiveRecord objects) and are tracked by creating experiences (the joining
of a alternative ID and a visitor). Finally, as visitors do things we track
events, which are distinguished from one another by their classes.

Here is a simplified version of the model code:

class Riak::Visitor
  include Ripple::Document
  many :events,  :class_name => "Riak::Event"
end

class Riak::Event
  include Ripple::Document
end

class Riak::ShareEvent < Riak::Event
  include Ripple::Document
end

class Riak::Experience
  include Ripple::Document
  one :visitor, :class_name => "Riak::Visitor"
  property :alternative_id, Integer, :presence => true
end

My problem is that I'd like to collect the set of visitors who have shared,
or more generally I'd like to return a set of visitors after narrowing down
the list by linking in specific kind of events. Well, my real problem is
that I still don't quite grok MapReduce, but this is what I'm trying to
accomplish.

The riak-client code is included below (see visitors_who_shared). It returns
a list of all visitors found in the map phase where keep is true. This isn't
surprising, but I'm not sure how to get the visitors if I don't “keep” them
in that phase.

Thanks in advance for any help. I'm also happy RTFM and would appreciate
specific suggestions for doing nontrivial MR jobs in JavaScript.

class Riak::Alternative #not a riak doc
  attr_accessor :ar_id

  def initialize(ar_id)
self.ar_id = ar_id
  end

  def visitors_who_shared
Riak::MapReduce.new(Ripple.client).
add("riak_experiences").
map(map_filter_by_alternative).
link(:bucket => 'riak_visitors', :keep => true).
link(:bucket => 'riak_events').
map("function(v){ return [[v.bucket, v.key]]; }").
map(map_share_events).
run
  end

  def map_share_events
f = <___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: MapReduce filtering question

2010-11-19 Thread Parker Thompson
Thanks, a few questions inline...

On Fri, Nov 19, 2010 at 2:43 PM, Sean Cribbs  wrote:

class Riak::Alternative
>  include Ripple::Document
>  many :visitors, :class_name => "Riak::Visitor"
>   property :alternative_id, Integer, :presence => true
>   key_on :alternative_id
> end
>

If I expect to be writing large numbers of visitor->alternatives links is it
performant to be writing them all as links on one object, as opposed to
creating many experience docs each with a link ?  Naïvely I would assume
this might less evenly distribute write load or degrade as the size of the
Link data grows.  Does this matter?


> 
>
> def visitors_who_shared
>  Riak::MapReduce.new(Ripple.client).
> add("riak_alternatives", ar_id.to_s).
>link(:bucket => 'riak_visitors').
>map(link_to_events_forward_visitor).
>map(map_share_events_to_visitor).
>reduce(["riak_kv_mapreduce", "reduce_set_union"]).
>map(map_identity, :keep => true).
>run
> end
>

Ah, I was looking for a set_union.  Is there a full list of these functions
hiding somewhere?

Thanks,

pt.
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com