Hi Fred, thanks for your answer.
I'm using Riak 2.1 see attached status export.
I'm working on a single cluster, and need to update from time to time
the some search index on all nodes.
As a cloud user, I can consider buying a spare host for a few days in
order to achieve a complete rollout.
I can understand your plan to remove an host from production while it
reconstruct its index. From my point of view your solution can only be
applied on a broken Solr index, that needs to be rebuild from scratch on
a single host.
In my case, I need to reindex my documents because I was updated my solr
schema, which requires to wipe existing index beforehand (create new
index, change bucket index_name prop, drop old index), on all hosts
since that's a bucket type property that I need to update.
Fred, is your plan can be really applied on a « I want to update my
search schema on my full cluster » ?
At the moment, I already created the new index, destroyed the old one,
and I am unable to use a slow python script to force all items to be
written again (and subsequently pushed to solr) since I get regular
timeout on key stream API (both protobuff and http).
Is there a way to run a program inside riak nodes (not http, not
protobuf) to achieve this simple algorithm:
for key in bucket.stream_keys():
obj = bucket.get(key)
bucket.store(obj)
I really fear that will not be able to restore my index any time soon. I
am not stressed out because we are not in production yet, I have still
plenty of time to fix that as new data is available. But this kind of
complex operations required by index update really freak me out.
Guillaume
On 29/08/2016 14:41, Fred Dushin wrote:
Hi Guillame,
A few questions.
What version of Riak?
Does the reindexing need to occur across the entire cluster, or just
on one node?
What are the expectations about query-ability while re-indexing is
going on?
If you can afford to take a node out of commission for query, then one
approach would be to delete your YZ data and YZ AAE trees, and let AAE
sync your 30 million documents from Riak. You can increase AAE tree
rebuild and exchange concurrency to make that occur more quickly than
it does by default, but that will put a fairly significant load on
that node. Moreover, because you have deleted indexed data on one
node, you will get inconsistent search results from Yokozuna, as the
node being reindexed will still show up as part of a coverage plan.
Depending on the version of Riak, however, you may be able to
manually remove that node from coverage plans through the Riak console
while re-indexing is going on. The node is still available for Riak
get/put operations (including indexing new entries into Solr), but it
will be excluded from any cover set when a query plan is generated. I
can't guarantee that this would take less than 5 days, however.
-Fred
On Aug 29, 2016, at 3:56 AM, Guillaume Boddaert
<guilla...@lighthouse-analytics.co
<mailto:guilla...@lighthouse-analytics.co>> wrote:
Hi,
I recently needed to alter my Riak Search schema for a bucket type
that contains ~30 millions rows. As a result, my index was wiped
since we are waiting for a Riak Search 2.2 feature that will sync
Riak storage with Solr index on such an occasion.
I adapted a since script suggested by Evren Esat Özkan there
(https://github.com/basho/yokozuna/issues/130#issuecomment-196189344). It
is a simple python script that will stream keys and trigger a store
action for any items. Unfortunately it failed past 178k items due to
time out on the key stream. I calculated that this kind of
reindexation mechanism would take up to 5 days without a crash to
succeed.
I was wondering if there would be a pure Erlang mean to achieve a
complete forced rewrite of every single element in my bucket type
rather that an error prone and very long python process.
How would you guys reindex a 30 million item bucket type in a fast
and reliable way ?
Thanks, Guillaume
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com <mailto:riak-users@lists.basho.com>
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
riak_auth_mods_version : <<"2.1.0-0-g31b8b30">>
erlydtl_version : <<"0.7.0">>
riak_control_version : <<"2.1.2-0-gab3f924">>
cluster_info_version : <<"2.0.3-0-g76c73fc">>
yokozuna_version : <<"2.1.2-0-g3520d11">>
ibrowse_version : <<"4.0.2">>
riak_search_version : <<"2.1.1-0-gffe2113">>
merge_index_version : <<"2.0.1-0-g0c8f77c">>
riak_kv_version : <<"2.1.2-0-gf969bba">>
riak_api_version : <<"2.1.2-0-gd8d510f">>
riak_pb_version : <<"2.1.0.2-0-g620bc70">>
protobuffs_version : <<"0.8.1p5-0-gf88fc3c">>
riak_dt_version : <<"2.1.1-0-ga2986bc">>
sidejob_version : <<"2.0.0-0-gc5aabba">>
riak_pipe_version : <<"2.1.1-0-gb1ac2cf">>
riak_core_version : <<"2.1.5-0-gb02ab53">>
exometer_core_version : <<"1.0.0-basho2-0-gb47a5d6">>
poolboy_version : <<"0.8.1p3-0-g8bb45fb">>
pbkdf2_version : <<"2.0.0-0-g7076584">>
eleveldb_version : <<"2.0.17-0-g973fc92">>
clique_version : <<"0.3.2-0-ge332c8f">>
bitcask_version : <<"1.7.2">>
basho_stats_version : <<"1.0.3">>
webmachine_version : <<"1.10.8-0-g7677c24">>
mochiweb_version : <<"2.9.0">>
inets_version : <<"5.9.6">>
xmerl_version : <<"1.3.4">>
erlang_js_version : <<"1.3.0-0-g07467d8">>
runtime_tools_version : <<"1.8.12">>
os_mon_version : <<"2.2.13">>
riak_sysmon_version : <<"2.0.0">>
ssl_version : <<"5.3.1">>
public_key_version : <<"0.20">>
crypto_version : <<"3.1">>
asn1_version : <<"2.0.3">>
sasl_version : <<"2.3.3">>
lager_version : <<"2.1.1">>
goldrush_version : <<"0.1.7">>
compiler_version : <<"4.9.3">>
syntax_tools_version : <<"1.6.11">>
stdlib_version : <<"1.19.3">>
kernel_version : <<"2.16.3">>
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com