Re: Riak search, post schema change reindexation

Fred Dushin Mon, 29 Aug 2016 05:42:36 -0700

Hi Guillame,

A few questions.

What version of Riak?

Does the reindexing need to occur across the entire cluster, or just on one 
node?

What are the expectations about query-ability while re-indexing is going on?

If you can afford to take a node out of commission for query, then one approach 
would be to delete your YZ data and YZ AAE trees, and let AAE sync your 30 
million documents from Riak.  You can increase AAE tree rebuild and exchange 
concurrency to make that occur more quickly than it does by default, but that 
will put a fairly significant load on that node.  Moreover, because you have 
deleted indexed data on one node, you will get inconsistent search results from 
Yokozuna, as the node being reindexed will still show up as part of a coverage 
plan.  Depending on the version of Riak, however, you may be able to manually 
remove that node from coverage plans through the Riak console while re-indexing 
is going on.  The node is still available for Riak get/put operations 
(including indexing new entries into Solr), but it will be excluded from any 
cover set when a query plan is generated.  I can't guarantee that this would 
take less than 5 days, however.

-Fred

> On Aug 29, 2016, at 3:56 AM, Guillaume Boddaert 
> <guilla...@lighthouse-analytics.co> wrote:
> 
> Hi,
> 
> I recently needed to alter my Riak Search schema for a bucket type that 
> contains ~30 millions rows. As a result, my index was wiped since we are 
> waiting for a Riak Search 2.2 feature that will sync Riak storage with Solr 
> index on such an occasion.
> 
> I adapted a since script suggested by Evren Esat Özkan there 
> (https://github.com/basho/yokozuna/issues/130#issuecomment-196189344 
> <https://github.com/basho/yokozuna/issues/130#issuecomment-196189344>). It is 
> a simple python script that will stream keys and trigger a store action for 
> any items. Unfortunately it failed past 178k items due to time out on the key 
> stream. I calculated that this kind of reindexation mechanism would take up 
> to 5 days without a crash to succeed. 
> 
> I was wondering if there would be a pure Erlang mean to achieve a complete 
> forced rewrite of every single element in my bucket type rather that an error 
> prone and very long python process.
> 
> How would you guys reindex a 30 million item bucket type in a fast and 
> reliable way ?
> 
> Thanks, Guillaume
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Riak search, post schema change reindexation

Reply via email to