We've got an expiry worker rig I can likely pass over offline. Its not
overly clever.

Basic idea stream a feed of keys into a pool of workers that spin off
delete calls.
We feed this based on continuous search's of an expiry TTL field in all
keys.

It'd likely be better to run this from with the Erlang riak layer... But
then there's that whole Erlang thing.

J
On Jun 4, 2015 1:49 PM, "Peter Herndon" <tphern...@gmail.com> wrote:

> Mmm, I think we’re looking at deleting about 50 million keys per day.
> That’s a completely back-of-envelope estimate, I haven’t done the actual
> math yet.
>
> —Peter
>
> > On Jun 4, 2015, at 3:28 AM, Daniel Abrahamsson <
> daniel.abrahams...@klarna.com> wrote:
> >
> > Hi Peter,
> >
> > What is "large-scale" in your case? How many keys do you need to delete,
> and how often?
> >
> > //Daniel
> >
> > On Wed, Jun 3, 2015 at 9:54 PM, Peter Herndon <tphern...@gmail.com>
> wrote:
> > Interesting thought. It might work for us, it might not, I’ll have to
> check with our CTO to see whether the expense makes sense under our
> circumstances.
> >
> > Thanks!
> >
> > —Peter
> > > On Jun 3, 2015, at 2:21 PM, Drew Kerrigan <d...@kerrigan.io> wrote:
> > >
> > > Another idea for a large-scale one-time removal of data, as well as an
> opportunity for a fresh start, would be to:
> > >
> > > 1. set up multi-data center replication between 2 clusters
> > > 2. implement a recv/2 hook on the sink which refuses data from the
> buckets / keys you would like to ignore / delete
> > > 3. trigger a full sync replication
> > > 4. start using the sync as your new source of data sans the ignored
> data
> > >
> > > Obviously this is costly, but it should have a fairly minimal impact
> to existing production users other than the moment that you switch traffic
> from the old cluster to the new one.
> > >
> > > Caveats: Not all Riak features are supported with MDC (search indexes
> and strong consistency in particular).
> > >
> > > On Wed, Jun 3, 2015 at 2:11 PM Peter Herndon <tphern...@gmail.com>
> wrote:
> > > Sadly, this is a production cluster already using leveldb as the
> backend. With that constraint in mind, and rebuilding the cluster not
> really being an option to enable multi-backends or bitcask, what would our
> best approach be?
> > >
> > > Thanks!
> > >
> > > —Peter
> > >
> > > > On Jun 3, 2015, at 12:09 PM, Alexander Sicular <sicul...@gmail.com>
> wrote:
> > > >
> > > > We are actively investigating better options for deletion of large
> amounts of keys. As Sargun mentioned, deleting the data dir for an entire
> backend via an operationalized rolling restart is probably the best
> approach right now for killing large amounts of keys.
> > > >
> > > > But if your key space can fit in memory the best way to kill keys is
> to use bitcask ttl if that's an option. 1. If you can even use bitcask in
> your environment due to the memory overhead and 2. If your use case allows
> for ttls which it may considering you may already be using time bound
> buckets....
> > > >
> > > > -Alexander
> > > >
> > > > @siculars
> > > > http://siculars.posthaven.com
> > > >
> > > > Sent from my iRotaryPhone
> > > >
> > > > On Jun 3, 2015, at 09:54, Sargun Dhillon <sdhil...@basho.com> wrote:
> > > >
> > > >> You could map your keys to a given bucket, and that bucket to a
> given backend using multi_backend. There is some cost to having lots of
> backends (memory overhead, FDs, etc...). When you want to do a mass drop,
> you could down the node, and delete that given backend, and bring it up.
> Caveat: AAE, MDC, nor mutable data play well with this scenario.
> > > >>
> > > >> On Wed, Jun 3, 2015 at 10:43 AM, Peter Herndon <tphern...@gmail.com>
> wrote:
> > > >> Hi list,
> > > >>
> > > >> We’re looking for the best way to handle large scale expiration of
> no-longer-useful data stored in Riak. We asked a while back, and the
> recommendation was to store the data in time-segmented buckets (bucket per
> day or per month), query on the current buckets, and use the streaming list
> keys API to handle slowly deleting the buckets that have aged out.
> > > >>
> > > >> Is that still the best approach for doing this kind of task? Or is
> there a better approach?
> > > >>
> > > >> Thanks!
> > > >>
> > > >> —Peter Herndon
> > > >> Sr. Application Engineer
> > > >> @Bitly
> > > >> _______________________________________________
> > > >> riak-users mailing list
> > > >> riak-users@lists.basho.com
> > > >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> > > >>
> > > >> _______________________________________________
> > > >> riak-users mailing list
> > > >> riak-users@lists.basho.com
> > > >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> > >
> > >
> > > _______________________________________________
> > > riak-users mailing list
> > > riak-users@lists.basho.com
> > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >
> >
> > _______________________________________________
> > riak-users mailing list
> > riak-users@lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >
>
>
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to