Re: Schema Architecture, Map Reduce & Key Lists

Mat Ellis Thu, 10 Feb 2011 14:59:01 -0800

Thanks Bryan, that certainly looks interesting. The clicks are amended but just 
once and only a tiny percentage (when they convert). We're basically doing what 
you describe: taking a click stream and processing it once into a set of 
summary tables for reporting & decision making. We'll take a look at it as soon 
as we've finished getting our head around the Ripple goodness.


Cheers

M.

On Feb 10, 2011, at 11:54 AM, Bryan Fink wrote:

> On Thu, Feb 10, 2011 at 12:35 PM, Mat Ellis <m...@tecnh.com> wrote:
>> We are converting a mysql based schema to Riak using Ripple. We're tracking
>> a lot of clicks, and each click belongs to a cascade of other objects:
>> click -> placement -> campaign -> customer
>> i.e. we do a lot of operations on these clicks grouped by placement or sets
>> of placements.
> … snip …
>> On a related noob-note, what would be the best way of creating a set of the
>> clicks for a given placement? Map Reduce or Riak Search or some other
>> method?
> 
> Hi, Mat.  I have an alternative strategy I think you could try if
> you're up for stepping outside of the Ripple interface.  Your incoming
> clicks reminded me of other stream data I've processed before, so the
> basic idea is to store clicks as a stream, and then process that
> stream later.  The tools I'd use to do this are Luwak[1] and
> luwak_mr[2].
> 
> First, store all clicks, as they arrive, in one Luwak file (or maybe
> one Luwak file per host accepting clicks, depending on your service's
> arrangement).  Luwak has a streaming interface that's available
> natively in distributed Erlang, or over HTTP by exploiting the
> "chunked" encoding type.  Roll over to a new file on whatever
> convenient trigger you like (time period, timeout, manual
> intervention, etc.).
> 
> Next, use map/reduce to process the stream.  The luwak_mr utility will
> allow you to specify a Luwak file by name, and it will handle toss
> each of the chunks of that file to various cluster nodes for
> processing.  The first stage of your map/reduce query just needs to be
> able to handle any single chunk of the file.
> 
> I've posted a few examples about how to use the luwak_mr
> utility.[3][4][5]  They deal with analyzing events in baseball games
> (another sort of stream of events).
> 
> Pros:
> - No need to list keys.
> - The time to process a day's data should be proportional to the
> number of clicks on that day (i.e. proportional to the size of the
> file).
> 
> Caveats:
> - Luwak works best with write-once data.  Modifying a block of a
> Luwak file after it has been written causes the block to be copied,
> and the old version of the block is not deleted.  (Even if some of
> your data is modification-heavy, this might work for the non-modified
> parts … like the key list for a day's clicks?)
> - I don't have good numbers for Luwak's speed/efficiency.
> - I've only recently started experimenting with Luwak in this
> map/reducing manner, so I'm not sure if there are other pitfalls.
> 
> [1] http://wiki.basho.com/Luwak.html
> [2] http://contrib.basho.com/luwak_mr.html
> [3] http://blog.beerriot.com/2011/01/16/mapreducing-luwak/
> [4] 
> http://blog.basho.com/2011/01/20/baseball-batting-average%2c-using-riak-map/reduce/
> [5] http://blog.basho.com/2011/01/26/fixing-the-count/


_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Schema Architecture, Map Reduce & Key Lists

Reply via email to