On Thu, Feb 10, 2011 at 12:35 PM, Mat Ellis <m...@tecnh.com> wrote: > We are converting a mysql based schema to Riak using Ripple. We're tracking > a lot of clicks, and each click belongs to a cascade of other objects: > click -> placement -> campaign -> customer > i.e. we do a lot of operations on these clicks grouped by placement or sets > of placements. … snip … > On a related noob-note, what would be the best way of creating a set of the > clicks for a given placement? Map Reduce or Riak Search or some other > method?
Hi, Mat. I have an alternative strategy I think you could try if you're up for stepping outside of the Ripple interface. Your incoming clicks reminded me of other stream data I've processed before, so the basic idea is to store clicks as a stream, and then process that stream later. The tools I'd use to do this are Luwak[1] and luwak_mr[2]. First, store all clicks, as they arrive, in one Luwak file (or maybe one Luwak file per host accepting clicks, depending on your service's arrangement). Luwak has a streaming interface that's available natively in distributed Erlang, or over HTTP by exploiting the "chunked" encoding type. Roll over to a new file on whatever convenient trigger you like (time period, timeout, manual intervention, etc.). Next, use map/reduce to process the stream. The luwak_mr utility will allow you to specify a Luwak file by name, and it will handle toss each of the chunks of that file to various cluster nodes for processing. The first stage of your map/reduce query just needs to be able to handle any single chunk of the file. I've posted a few examples about how to use the luwak_mr utility.[3][4][5] They deal with analyzing events in baseball games (another sort of stream of events). Pros: - No need to list keys. - The time to process a day's data should be proportional to the number of clicks on that day (i.e. proportional to the size of the file). Caveats: - Luwak works best with write-once data. Modifying a block of a Luwak file after it has been written causes the block to be copied, and the old version of the block is not deleted. (Even if some of your data is modification-heavy, this might work for the non-modified parts … like the key list for a day's clicks?) - I don't have good numbers for Luwak's speed/efficiency. - I've only recently started experimenting with Luwak in this map/reducing manner, so I'm not sure if there are other pitfalls. [1] http://wiki.basho.com/Luwak.html [2] http://contrib.basho.com/luwak_mr.html [3] http://blog.beerriot.com/2011/01/16/mapreducing-luwak/ [4] http://blog.basho.com/2011/01/20/baseball-batting-average%2c-using-riak-map/reduce/ [5] http://blog.basho.com/2011/01/26/fixing-the-count/ _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com