That's interesting. Unfortunately I think a smart key would simply shift the problem elsewhere as we also generate a UUID so clicks can be converted when a conversion event occurs. So we either have a UUID as the key (and can quickly look up a click to see if it's valid and then the associated data if it is) or we can have a Smart Key and can easily lookup the associated data. Either way the posting I linked to suggests that unless I supply long lists of keys I have to scan *all* keys in *all* buckets, no matter how much I segment my data into separate (i.e. smaller) "sub-" buckets.
The auto-expire feature will definitely be handy. We'll look at that. Does anyone have: (a) opinions on addressing this issue using Ripple's "many" and "one" associations? It seems I could end up with a placement record with millions of clicks attached (b) any good examples of Ripple's associations being used in real code (as opposed to examples here http://seancribbs.github.com/ripple/Ripple/Associations.html) and/or a discussion about Ripple::Document vs Ripple::EmbeddedDocument? M. On Feb 10, 2011, at 9:52 AM, Jeremiah Peschka wrote: > Riak 0.14 brings key filters - it's still going to take time to filter the > keys in memory, but it's an in memory operation. Using 'smart keys' along the > lines of UNIXTIMESTAMP:placement:campaign:customer you can rapidly filter > your keys using meaningful criteria and perform MapReduce jobs on the results. > > Nothing says you can't also store the same data in multiple buckets in > multiple formats to make querying easier. > > In response to number 2 - there's a way to set Riak to auto expire data from > a bucket. It'll only be removed when compactions occur, but if you're storing > clickstream data that should be happen often enough. > > -- > Jeremiah Peschka > Microsoft SQL Server MVP > MCITP: Database Developer, DBA > On Thursday, February 10, 2011 at 9:35 AM, Mat Ellis wrote: > >> We are converting a mysql based schema to Riak using Ripple. We're tracking >> a lot of clicks, and each click belongs to a cascade of other objects: >> >> click -> placement -> campaign -> customer >> >> i.e. we do a lot of operations on these clicks grouped by placement or sets >> of placements. >> >> Reading this >> http://lists.basho.com/pipermail/riak-users_lists.basho.com/2010-July/001591.html >> gave me pause for thought. I was hoping the time needed to crunch each >> day's data would be proportional to the volume of clicks on that day but it >> seems that it would be proportional to the total number of clicks ever. >> >> What's the best approach here? I can see a number of 'solutions' each of >> them complicated: >> >> (1) Maintain an index of clicks by day so that we can focus our operations >> on a time bound set of clicks >> >> (2) Delete or archive clicks once they have been processed or after a >> certain number of days >> >> (3) Add many links to each placement, one per click (millions potentially) >> >> On a related noob-note, what would be the best way of creating a set of the >> clicks for a given placement? Map Reduce or Riak Search or some other method? >> >> Thanks in advance. >> >> M. >> _______________________________________________ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com