Another idea for a large-scale one-time removal of data, as well as an
opportunity for a fresh start, would be to:

1. set up multi-data center replication between 2 clusters
2. implement a recv/2 hook on the sink which refuses data from the buckets
/ keys you would like to ignore / delete
3. trigger a full sync replication
4. start using the sync as your new source of data sans the ignored data

Obviously this is costly, but it should have a fairly minimal impact to
existing production users other than the moment that you switch traffic
from the old cluster to the new one.

Caveats: Not all Riak features are supported with MDC (search indexes and
strong consistency in particular).

On Wed, Jun 3, 2015 at 2:11 PM Peter Herndon <tphern...@gmail.com> wrote:

> Sadly, this is a production cluster already using leveldb as the backend.
> With that constraint in mind, and rebuilding the cluster not really being
> an option to enable multi-backends or bitcask, what would our best approach
> be?
>
> Thanks!
>
> —Peter
>
> > On Jun 3, 2015, at 12:09 PM, Alexander Sicular <sicul...@gmail.com>
> wrote:
> >
> > We are actively investigating better options for deletion of large
> amounts of keys. As Sargun mentioned, deleting the data dir for an entire
> backend via an operationalized rolling restart is probably the best
> approach right now for killing large amounts of keys.
> >
> > But if your key space can fit in memory the best way to kill keys is to
> use bitcask ttl if that's an option. 1. If you can even use bitcask in your
> environment due to the memory overhead and 2. If your use case allows for
> ttls which it may considering you may already be using time bound
> buckets....
> >
> > -Alexander
> >
> > @siculars
> > http://siculars.posthaven.com
> >
> > Sent from my iRotaryPhone
> >
> > On Jun 3, 2015, at 09:54, Sargun Dhillon <sdhil...@basho.com> wrote:
> >
> >> You could map your keys to a given bucket, and that bucket to a given
> backend using multi_backend. There is some cost to having lots of backends
> (memory overhead, FDs, etc...). When you want to do a mass drop, you could
> down the node, and delete that given backend, and bring it up. Caveat: AAE,
> MDC, nor mutable data play well with this scenario.
> >>
> >> On Wed, Jun 3, 2015 at 10:43 AM, Peter Herndon <tphern...@gmail.com>
> wrote:
> >> Hi list,
> >>
> >> We’re looking for the best way to handle large scale expiration of
> no-longer-useful data stored in Riak. We asked a while back, and the
> recommendation was to store the data in time-segmented buckets (bucket per
> day or per month), query on the current buckets, and use the streaming list
> keys API to handle slowly deleting the buckets that have aged out.
> >>
> >> Is that still the best approach for doing this kind of task? Or is
> there a better approach?
> >>
> >> Thanks!
> >>
> >> —Peter Herndon
> >> Sr. Application Engineer
> >> @Bitly
> >> _______________________________________________
> >> riak-users mailing list
> >> riak-users@lists.basho.com
> >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >>
> >> _______________________________________________
> >> riak-users mailing list
> >> riak-users@lists.basho.com
> >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to