I think that CRDT support in Riak will meet your needs. It pushes garbage collection down to Riak itself so you don't have to worry about it.
The downside is that it is only in Riak 2.0 and not all clients support it yet or their support is immature. The current support in the .net client is very low level, for example. On Sep 26, 2013 4:08 PM, "Brady Wetherington" <br...@bespincorp.com> wrote: > Oh, I get what the siblings/allow_mult business is for, just wondering if > I can use it off-label a little, and eventually do 'conflict resolution' > which would make the results be much more reasonable. > > But it sounds like I shouldn't do that. That's totally fine. > > Since I'm doing a write-once, update-never environment - I don't see how > allow_mult would help me otherwise? A new write will always be to a new > key. There will never be an update. So if that's the case - no need for > allow_mult. Does that sound right? > > -B. > > > On Wed, Sep 25, 2013 at 6:30 PM, Jeremiah Peschka < > jeremiah.pesc...@gmail.com> wrote: > >> inline. >> >> --- >> Jeremiah Peschka - Founder, Brent Ozar Unlimited >> MCITP: SQL Server 2008, MVP >> Cloudera Certified Developer for Apache Hadoop >> >> >> On Wed, Sep 25, 2013 at 2:47 PM, Brady Wetherington <br...@bespincorp.com >> > wrote: >> >>> I've built it a solid proof-of-concept system on leveldb, and use some >>> 2i indexes in order to search for certain things - usually just for counts >>> of things. >>> >>> I have two questions so far: >>> >>> First off, why is Bitcask the default? Is it just because it is faster? >>> Or is it considered more 'stable' or something? >>> >> >> Long ago, when bitcask was elected as the default, LevelDB was not a >> thing. >> >> Databases strive for stability and the principle of least surprise. >> Changing anything can potentially introduce performance regressions, >> stability problems, and any host of other undesirable and reputation >> destroying things. >> >> Changing the storage back end is high up on the list of things I'd never >> want to do in a database. Why do you think MySQL still defaults to MyISAM? >> >> >>> >>> Next, I've learned about the allow_mult feature you can set on buckets. >>> I wonder if I should use this for my most heavily-used primary-purpose >>> queries? Is there a limit to how many 'siblings' you can have for an entry? >>> Is it inadvisable to do what I'm talking about? Would fetching all of the >>> siblings end up being a disastrous nightmare or something? >>> >> >> The upper limit will depend on the size of your objects. You don't want >> to have object sizes (including siblings) much beyond 6MB. You'll have a >> lot of network congestion. You certainly *could* have bigger object + >> sibling collections, but you'd want to beef up the network backend to >> something like 10GbE, 40GbE, or InfiniBand to deal with the increased >> gossip. >> >> Fetching all of your siblings is bad if you never resolve siblings since >> you'll have a lot of data. >> >> Allow_mult is typically turned on for production clusters. This is set >> off by default to help new users get a handle on Riak quickly without >> having to worry about siblings. Once you get the hang of how Riak behaves, >> turning on siblings is usually a good thing. >> >> Depending on resolution, it's probably best to read your data, resolve >> siblings, and send that garbage collected object back to Riak - even if >> you're performing a "read only" query. The new Riak DT features eliminate >> some of the worry about siblings by pushing the responsibility back down to >> Riak. Those features are only available if you're building from source, but >> hopefully Riak 2.0 will be out soon. >> >> >>> I *assume* - and I could be wrong - that a 2i query would be slower than >>> a fetch-of-siblings for a particular key - is that wrong? >>> >>> If I switch from using 2i indexes to using allow_mult and siblings, we'd >>> be talking a few hundred thousand to low millions for a sibling-count. >>> >> >> I do not think 'siblings' means what you think it means. >> >> A sibling would occur if two clients, A and B, read v1 of an object and >> then issue writes. >> >> Client A updates object and sets preferences to ['cat pictures', 'ham >> sandwiches'] >> Client B updates object and sets preferences to ['knitting with bacon'] >> >> With allow_mult enabled you'd have two versions of the object. These are >> siblings. >> >> If you're thinking of some kind of index created by your application, you >> could look at 2i vs using siblings to build a secondary index: >> http://basho.com/index-for-fun-and-for-profit/ Even when you're creating >> your own secondary index, you still want to perform garbage collection on >> the data you're storing in Riak. >> >> >>> Thanks for making an excellent product! Can't wait to get this bad boy >>> into production and really see what it can do! >>> >>> -B. >>> >>> _______________________________________________ >>> riak-users mailing list >>> riak-users@lists.basho.com >>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>> >>> >> >> _______________________________________________ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> >> >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com