Pavel as an alternative to re-writing the objects to cause them to be indexed, you may invoke what I call a map operation with side-effects.
You define an Erlang map-phase function as follows: map_reindex({error,notfound}, _, _) -> []; map_reindex(RiakObject, _, _) -> riak_search_kv_hook:precommit(RiakObject), []. You want to run that against all of the keys in the bucket by posting a mapred job like this: { "inputs": "<your-bucket>", "query": [ { "map": { "function": "map_reindex", "language": "erlang", "module": "<your-module>" } } ], "timeout": <your-timeout> } We have used this technique to re-index rather large clusters and it runs quickly because you are doing it in parallel across all of the nodes in the cluster. -- gordon On Oct 16, 2012, at 07:44 , Ryan Zezeski <rzeze...@basho.com> wrote: > > > On Sun, Oct 14, 2012 at 12:33 AM, Pavel Kogan <pavel.ko...@cortica.com> wrote: > > 1) Is search enabling has any impact on read latency/throughput? > > If you are reading and searching at the same time there is a good chance it > will. It will cause more disk seeks. > > 2) Is search enabling has any impact on RAM usage? > > Yes, the index engine behind Riak Search makes heavy usage of Erlang ETS > tables. Each partition has an in-memory buffer as well as an in-memory > offset table for every segment. It also uses a temporary ETS table for every > write to store posting data. The ETS system limit can even become an issue > in overload scenarios. > > 3) In production we have no search enabled. What is the best way to > enable search without stop production? I thought about something like: > 1) Enable search node after node. > > You could change the app env dynamically but that's only half the problem. > The other half is then starting the Riak Search application. I think > application:start(merge_index) followed by application:start(riak_search) > should work but I'm not 100% sure and this has not been tested. You'll also > want to make sure to edit all app.configs so that it is persistent. > > > 2) Execute some night script that runs on all keys and overwrite them back > with proper mime type. > > Yes, you'll want to install the commit hook on the buckets you wish to index. > Then you'll want to do a streaming list-keys or bucket map-reduce and > re-write the data. > > > 4) If we see that search overhead is something we can't handle, is there > simple > way to disable it without stop production? > > I think the best course of action in this case would be to disable the commit > hook. But you would have to keep track of anything written during this time > and re-write it after re-installing the hook. If you don't then you'll have > to re-index everything because you don't know what you missed. > > 5) In what case we would need repair? It is said - on replica loss, but if I > understand > correct we have 3 replicas on different nodes don't we? If it happens how > difficult and > long would it be for large cluster (about 100 nodes)? > > Repair is on a per partition basis. Number of nodes doesn't come into play. > Repair is very specific in that it requires the adjacent partitions to be in > a good, convergent state. If they aren't then repair isn't much help. > > A lot of these entropy issues go away in Yokozuna. Repairing indexes is done > automatically, in the background, in an efficient manner. There is no need > to re-write data or run manual repair commands. > > -Z > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com