Well, I’ve been looking to make my theoretical Erlang knowledge less theoretical and somewhat more practical, so I wouldn’t say no. And this approach is pretty much what we thought we’d use originally.
Since then it has come to light that our product folks have given us permission to just delete all the data. But we’re still going to need a long-term solution. We may wind up reconfiguring the cluster to use the multi-backend solution Sasha proposed. —Peter > On Jun 4, 2015, at 5:54 PM, John O'Brien <boar...@gmail.com> wrote: > > We've got an expiry worker rig I can likely pass over offline. Its not overly > clever. > > Basic idea stream a feed of keys into a pool of workers that spin off delete > calls. > We feed this based on continuous search's of an expiry TTL field in all keys. > > It'd likely be better to run this from with the Erlang riak layer... But then > there's that whole Erlang thing. > > J > > On Jun 4, 2015 1:49 PM, "Peter Herndon" <tphern...@gmail.com> wrote: > Mmm, I think we’re looking at deleting about 50 million keys per day. That’s > a completely back-of-envelope estimate, I haven’t done the actual math yet. > > —Peter > > > On Jun 4, 2015, at 3:28 AM, Daniel Abrahamsson > > <daniel.abrahams...@klarna.com> wrote: > > > > Hi Peter, > > > > What is "large-scale" in your case? How many keys do you need to delete, > > and how often? > > > > //Daniel > > > > On Wed, Jun 3, 2015 at 9:54 PM, Peter Herndon <tphern...@gmail.com> wrote: > > Interesting thought. It might work for us, it might not, I’ll have to check > > with our CTO to see whether the expense makes sense under our circumstances. > > > > Thanks! > > > > —Peter > > > On Jun 3, 2015, at 2:21 PM, Drew Kerrigan <d...@kerrigan.io> wrote: > > > > > > Another idea for a large-scale one-time removal of data, as well as an > > > opportunity for a fresh start, would be to: > > > > > > 1. set up multi-data center replication between 2 clusters > > > 2. implement a recv/2 hook on the sink which refuses data from the > > > buckets / keys you would like to ignore / delete > > > 3. trigger a full sync replication > > > 4. start using the sync as your new source of data sans the ignored data > > > > > > Obviously this is costly, but it should have a fairly minimal impact to > > > existing production users other than the moment that you switch traffic > > > from the old cluster to the new one. > > > > > > Caveats: Not all Riak features are supported with MDC (search indexes and > > > strong consistency in particular). > > > > > > On Wed, Jun 3, 2015 at 2:11 PM Peter Herndon <tphern...@gmail.com> wrote: > > > Sadly, this is a production cluster already using leveldb as the backend. > > > With that constraint in mind, and rebuilding the cluster not really being > > > an option to enable multi-backends or bitcask, what would our best > > > approach be? > > > > > > Thanks! > > > > > > —Peter > > > > > > > On Jun 3, 2015, at 12:09 PM, Alexander Sicular <sicul...@gmail.com> > > > > wrote: > > > > > > > > We are actively investigating better options for deletion of large > > > > amounts of keys. As Sargun mentioned, deleting the data dir for an > > > > entire backend via an operationalized rolling restart is probably the > > > > best approach right now for killing large amounts of keys. > > > > > > > > But if your key space can fit in memory the best way to kill keys is to > > > > use bitcask ttl if that's an option. 1. If you can even use bitcask in > > > > your environment due to the memory overhead and 2. If your use case > > > > allows for ttls which it may considering you may already be using time > > > > bound buckets.... > > > > > > > > -Alexander > > > > > > > > @siculars > > > > http://siculars.posthaven.com > > > > > > > > Sent from my iRotaryPhone > > > > > > > > On Jun 3, 2015, at 09:54, Sargun Dhillon <sdhil...@basho.com> wrote: > > > > > > > >> You could map your keys to a given bucket, and that bucket to a given > > > >> backend using multi_backend. There is some cost to having lots of > > > >> backends (memory overhead, FDs, etc...). When you want to do a mass > > > >> drop, you could down the node, and delete that given backend, and > > > >> bring it up. Caveat: AAE, MDC, nor mutable data play well with this > > > >> scenario. > > > >> > > > >> On Wed, Jun 3, 2015 at 10:43 AM, Peter Herndon <tphern...@gmail.com> > > > >> wrote: > > > >> Hi list, > > > >> > > > >> We’re looking for the best way to handle large scale expiration of > > > >> no-longer-useful data stored in Riak. We asked a while back, and the > > > >> recommendation was to store the data in time-segmented buckets (bucket > > > >> per day or per month), query on the current buckets, and use the > > > >> streaming list keys API to handle slowly deleting the buckets that > > > >> have aged out. > > > >> > > > >> Is that still the best approach for doing this kind of task? Or is > > > >> there a better approach? > > > >> > > > >> Thanks! > > > >> > > > >> —Peter Herndon > > > >> Sr. Application Engineer > > > >> @Bitly > > > >> _______________________________________________ > > > >> riak-users mailing list > > > >> riak-users@lists.basho.com > > > >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > >> > > > >> _______________________________________________ > > > >> riak-users mailing list > > > >> riak-users@lists.basho.com > > > >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > > > > > > > _______________________________________________ > > > riak-users mailing list > > > riak-users@lists.basho.com > > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > > > > _______________________________________________ > > riak-users mailing list > > riak-users@lists.basho.com > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com