Good idea, thanks. M.
On Feb 10, 2011, at 4:10 PM, Alexander Sicular wrote: > i would change the model and have another stream for "converted" clicks. > > -Alexander Sicular > > @siculars > > On Feb 10, 2011, at 5:58 PM, Mat Ellis wrote: > >> Thanks Bryan, that certainly looks interesting. The clicks are amended but >> just once and only a tiny percentage (when they convert). We're basically >> doing what you describe: taking a click stream and processing it once into a >> set of summary tables for reporting & decision making. We'll take a look at >> it as soon as we've finished getting our head around the Ripple goodness. >> >> Cheers >> >> M. >> >> On Feb 10, 2011, at 11:54 AM, Bryan Fink wrote: >> >>> On Thu, Feb 10, 2011 at 12:35 PM, Mat Ellis <m...@tecnh.com> wrote: >>>> We are converting a mysql based schema to Riak using Ripple. We're tracking >>>> a lot of clicks, and each click belongs to a cascade of other objects: >>>> click -> placement -> campaign -> customer >>>> i.e. we do a lot of operations on these clicks grouped by placement or sets >>>> of placements. >>> … snip … >>>> On a related noob-note, what would be the best way of creating a set of the >>>> clicks for a given placement? Map Reduce or Riak Search or some other >>>> method? >>> >>> Hi, Mat. I have an alternative strategy I think you could try if >>> you're up for stepping outside of the Ripple interface. Your incoming >>> clicks reminded me of other stream data I've processed before, so the >>> basic idea is to store clicks as a stream, and then process that >>> stream later. The tools I'd use to do this are Luwak[1] and >>> luwak_mr[2]. >>> >>> First, store all clicks, as they arrive, in one Luwak file (or maybe >>> one Luwak file per host accepting clicks, depending on your service's >>> arrangement). Luwak has a streaming interface that's available >>> natively in distributed Erlang, or over HTTP by exploiting the >>> "chunked" encoding type. Roll over to a new file on whatever >>> convenient trigger you like (time period, timeout, manual >>> intervention, etc.). >>> >>> Next, use map/reduce to process the stream. The luwak_mr utility will >>> allow you to specify a Luwak file by name, and it will handle toss >>> each of the chunks of that file to various cluster nodes for >>> processing. The first stage of your map/reduce query just needs to be >>> able to handle any single chunk of the file. >>> >>> I've posted a few examples about how to use the luwak_mr >>> utility.[3][4][5] They deal with analyzing events in baseball games >>> (another sort of stream of events). >>> >>> Pros: >>> - No need to list keys. >>> - The time to process a day's data should be proportional to the >>> number of clicks on that day (i.e. proportional to the size of the >>> file). >>> >>> Caveats: >>> - Luwak works best with write-once data. Modifying a block of a >>> Luwak file after it has been written causes the block to be copied, >>> and the old version of the block is not deleted. (Even if some of >>> your data is modification-heavy, this might work for the non-modified >>> parts … like the key list for a day's clicks?) >>> - I don't have good numbers for Luwak's speed/efficiency. >>> - I've only recently started experimenting with Luwak in this >>> map/reducing manner, so I'm not sure if there are other pitfalls. >>> >>> [1] http://wiki.basho.com/Luwak.html >>> [2] http://contrib.basho.com/luwak_mr.html >>> [3] http://blog.beerriot.com/2011/01/16/mapreducing-luwak/ >>> [4] >>> http://blog.basho.com/2011/01/20/baseball-batting-average%2c-using-riak-map/reduce/ >>> [5] http://blog.basho.com/2011/01/26/fixing-the-count/ >> >> >> _______________________________________________ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com