Multi-backend bitcask with auto-expire for sure sounds like the most
future-proof solution.

For our part, we tend do delete keys in map-reduce jobs, since we have more
complex logic for determining when it is time to delete objects. In our
current setup, it takes about ~3 minutes to go through 1.5M keys, deleting
~10000 of them. Given the amount of keys you specify (50 million being the
amount to be deleted, probably a fraction of the data you store), it seems
to be too expensive for your use case.

//Daniel

On Fri, Jun 5, 2015 at 12:02 AM, Peter Herndon <tphern...@gmail.com> wrote:

> Well, I’ve been looking to make my theoretical Erlang knowledge less
> theoretical and somewhat more practical, so I wouldn’t say no. And this
> approach is pretty much what we thought we’d use originally.
>
> Since then it has come to light that our product folks have given us
> permission to just delete all the data. But we’re still going to need a
> long-term solution. We may wind up reconfiguring the cluster to use the
> multi-backend solution Sasha proposed.
>
> —Peter
> > On Jun 4, 2015, at 5:54 PM, John O'Brien <boar...@gmail.com> wrote:
> >
> > We've got an expiry worker rig I can likely pass over offline. Its not
> overly clever.
> >
> > Basic idea stream a feed of keys into a pool of workers that spin off
> delete calls.
> > We feed this based on continuous search's of an expiry TTL field in all
> keys.
> >
> > It'd likely be better to run this from with the Erlang riak layer... But
> then there's that whole Erlang thing.
> >
> > J
> >
> > On Jun 4, 2015 1:49 PM, "Peter Herndon" <tphern...@gmail.com> wrote:
> > Mmm, I think we’re looking at deleting about 50 million keys per day.
> That’s a completely back-of-envelope estimate, I haven’t done the actual
> math yet.
> >
> > —Peter
> >
> > > On Jun 4, 2015, at 3:28 AM, Daniel Abrahamsson <
> daniel.abrahams...@klarna.com> wrote:
> > >
> > > Hi Peter,
> > >
> > > What is "large-scale" in your case? How many keys do you need to
> delete, and how often?
> > >
> > > //Daniel
> > >
> > > On Wed, Jun 3, 2015 at 9:54 PM, Peter Herndon <tphern...@gmail.com>
> wrote:
> > > Interesting thought. It might work for us, it might not, I’ll have to
> check with our CTO to see whether the expense makes sense under our
> circumstances.
> > >
> > > Thanks!
> > >
> > > —Peter
> > > > On Jun 3, 2015, at 2:21 PM, Drew Kerrigan <d...@kerrigan.io> wrote:
> > > >
> > > > Another idea for a large-scale one-time removal of data, as well as
> an opportunity for a fresh start, would be to:
> > > >
> > > > 1. set up multi-data center replication between 2 clusters
> > > > 2. implement a recv/2 hook on the sink which refuses data from the
> buckets / keys you would like to ignore / delete
> > > > 3. trigger a full sync replication
> > > > 4. start using the sync as your new source of data sans the ignored
> data
> > > >
> > > > Obviously this is costly, but it should have a fairly minimal impact
> to existing production users other than the moment that you switch traffic
> from the old cluster to the new one.
> > > >
> > > > Caveats: Not all Riak features are supported with MDC (search
> indexes and strong consistency in particular).
> > > >
> > > > On Wed, Jun 3, 2015 at 2:11 PM Peter Herndon <tphern...@gmail.com>
> wrote:
> > > > Sadly, this is a production cluster already using leveldb as the
> backend. With that constraint in mind, and rebuilding the cluster not
> really being an option to enable multi-backends or bitcask, what would our
> best approach be?
> > > >
> > > > Thanks!
> > > >
> > > > —Peter
> > > >
> > > > > On Jun 3, 2015, at 12:09 PM, Alexander Sicular <sicul...@gmail.com>
> wrote:
> > > > >
> > > > > We are actively investigating better options for deletion of large
> amounts of keys. As Sargun mentioned, deleting the data dir for an entire
> backend via an operationalized rolling restart is probably the best
> approach right now for killing large amounts of keys.
> > > > >
> > > > > But if your key space can fit in memory the best way to kill keys
> is to use bitcask ttl if that's an option. 1. If you can even use bitcask
> in your environment due to the memory overhead and 2. If your use case
> allows for ttls which it may considering you may already be using time
> bound buckets....
> > > > >
> > > > > -Alexander
> > > > >
> > > > > @siculars
> > > > > http://siculars.posthaven.com
> > > > >
> > > > > Sent from my iRotaryPhone
> > > > >
> > > > > On Jun 3, 2015, at 09:54, Sargun Dhillon <sdhil...@basho.com>
> wrote:
> > > > >
> > > > >> You could map your keys to a given bucket, and that bucket to a
> given backend using multi_backend. There is some cost to having lots of
> backends (memory overhead, FDs, etc...). When you want to do a mass drop,
> you could down the node, and delete that given backend, and bring it up.
> Caveat: AAE, MDC, nor mutable data play well with this scenario.
> > > > >>
> > > > >> On Wed, Jun 3, 2015 at 10:43 AM, Peter Herndon <
> tphern...@gmail.com> wrote:
> > > > >> Hi list,
> > > > >>
> > > > >> We’re looking for the best way to handle large scale expiration
> of no-longer-useful data stored in Riak. We asked a while back, and the
> recommendation was to store the data in time-segmented buckets (bucket per
> day or per month), query on the current buckets, and use the streaming list
> keys API to handle slowly deleting the buckets that have aged out.
> > > > >>
> > > > >> Is that still the best approach for doing this kind of task? Or
> is there a better approach?
> > > > >>
> > > > >> Thanks!
> > > > >>
> > > > >> —Peter Herndon
> > > > >> Sr. Application Engineer
> > > > >> @Bitly
> > > > >> _______________________________________________
> > > > >> riak-users mailing list
> > > > >> riak-users@lists.basho.com
> > > > >>
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> > > > >>
> > > > >> _______________________________________________
> > > > >> riak-users mailing list
> > > > >> riak-users@lists.basho.com
> > > > >>
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> > > >
> > > >
> > > > _______________________________________________
> > > > riak-users mailing list
> > > > riak-users@lists.basho.com
> > > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> > >
> > >
> > > _______________________________________________
> > > riak-users mailing list
> > > riak-users@lists.basho.com
> > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> > >
> >
> >
> > _______________________________________________
> > riak-users mailing list
> > riak-users@lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to