Okay, so here's what I'm thinking now after reading through some of the M/R docs. Suppose I did this.
1. Create 2 buckets - one for K/V pairs - one for changed keys keyed by a timestamp or bin or something (run in post-commit on source colo). 2. Replicate both buckets to remote colo 2. Use a key filter with M/R to get keys changed from some time in the past 3. Run M/R regularly to publish key changes (probably to a rabbit queue) 4. Have local consumer read key changes then grab updated Values from first bucket I think this will all work, I'm not totally sure on the key filtering, but it seems like a second bucket with time based keys would work best. I plan to serialize all writes to each bucket as that is a requirement for auditing so just having a single integer key with the time the entry was written will probably work, then a key filter with a simple greater than. I can even overlap times to pick up any late additions caused by backups in replication, since I only keep track of changed keys, and always read the most current. I guess you could end up with the timestamp based bucket replicating faster and thus data drift, hmm, that could be an issue. Maybe a secondary index with time might work better. I believe I need some sort of secondary index as otherwise iterating over all the entries in a bucket would be costly. I don't know exact numbers but I would guess I'm looking at worst case several million K/V pairs per bucket so maybe M/R on that isn't so bad. Is there any speed up with 2i and a key filter (can you even create a key filter based on 2i?). Anyway, still searching for a way to do this efficiently, -Anthony On Wed, Apr 04, 2012 at 09:20:04AM -0700, Anthony Molinaro wrote: > > On Wed, Apr 04, 2012 at 08:10:29AM -0600, Jon Meredith wrote: > > Riak does have a last modified field, but it's last modified by client so > > is deliberately left untouched on replication. Similarly the vclock is not > > incremented either (the vclocks/siblings from both sides are resolved using > > the two vclocks). > > That's great, as I'd want to know on the far end when the client modified > it. > > > There are no obvious mechanisms for doing what you want currently. I'll > > think about options and somebody will get back to you. > > Is it not possible to use the last modified filed in a Map/Reduce? I've > not actually played with M/R in Riak yet (as I've only ever used it > previously as a Key/Value store). I'll try to dig into it a bit today > but I assumed I could do something to map over all records in a bucket > checking last modified, and return the set modified since a certain > time (or better yet put them in a rabbit queue to be consumed by my > systems which will cache the data). > > Alternatively, I could maybe have a second bucket representing the changed > keys, where each time a key is changed in the primary bucket, I could > add an entry to the other bucket. I could then replicate that bucket > and just list keys on the remote side (maybe also deleting so subsequent > list keys only get changes, but then I think the replicator will replace > those keys, so I'd have to have some sort of bidirectional replication > for those buckets, sounds messy). > > Anyway, hopefully someone will have an idea, > > -Anthony > > -- > ------------------------------------------------------------------------ > Anthony Molinaro <antho...@alumni.caltech.edu> > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com -- ------------------------------------------------------------------------ Anthony Molinaro <antho...@alumni.caltech.edu> _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com