Hi Fred, Thanks for internal call tips, I'll dig deeper!
I've attached recent results of `riak-admin search aae-status` from all nodes. On 5 April 2016 at 22:41, Fred Dushin <fdus...@basho.com> wrote: > Hi Oleksiy, > > I assume you are getting this information through riak-admin. Can you > post the results here? > > If you want to dig deeper, you can probe the individual hash trees for > their build time. I will paste a few snippets of erlang here, which I am > hoping you can extend to use with list comprehensions and rpc:multicalls. > If that's too much to ask, let us know and I can try to put something > together that is more "big easy button". > > First, on any individual node, you can get the Riak partitions on that > node, via > > (dev1@127.0.0.1)1> Partitons = [P || {_, P, _} <- > riak_core_vnode_manager:all_vnodes(riak_kv_vnode)]. > [913438523331814323877303020447676887284957839360, > 182687704666362864775460604089535377456991567872, > 1187470080331358621040493926581979953470445191168, > 730750818665451459101842416358141509827966271488, > 1370157784997721485815954530671515330927436759040, > 1004782375664995756265033322492444576013453623296, > 822094670998632891489572718402909198556462055424, > 456719261665907161938651510223838443642478919680, > 274031556999544297163190906134303066185487351808, > 1096126227998177188652763624537212264741949407232, > 365375409332725729550921208179070754913983135744, > 91343852333181432387730302044767688728495783936, > 639406966332270026714112114313373821099470487552,0, > 1278813932664540053428224228626747642198940975104, > 548063113999088594326381812268606132370974703616] > > For any one partition, you can get to the Pid associated with the > yz_index_hashtree associated with that partition, e.g., > > (dev1@127.0.0.1)2> {ok, Pid} = > yz_entropy_mgr:get_tree(913438523331814323877303020447676887284957839360). > {ok,<0.2872.0>} > > and from there you can get the state information about the hahstree, which > includes its build time. You can read the record definitions associated > with the yz_index_hashtree state by calling rr() on the yz_index_hashtree > module first, if you want to make the state slightly more readable: > > (dev1@127.0.0.1)3> rr(yz_index_hashtree). > [entropy_data,state,xmerl_event,xmerl_fun_states, > xmerl_scanner,xmlAttribute,xmlComment,xmlContext,xmlDecl, > xmlDocument,xmlElement,xmlNamespace,xmlNode,xmlNsNode, > xmlObj,xmlPI,xmlText] > (dev1@127.0.0.1)5> sys:get_state(Pid). > #state{index = 913438523331814323877303020447676887284957839360, > built = true,expired = false,lock = undefined, > path = > "./data/yz_anti_entropy/913438523331814323877303020447676887284957839360", > build_time = {1459,801655,506719}, > trees = [{{867766597165223607683437869425293042920709947392, > 3}, > {state,<<152,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...>>, > > 913438523331814323877303020447676887284957839360,3,1048576, > 1024,0, > {dict,0,16,16,8,80,48,{[],[],...},{{...}}}, > <<>>, > > "./data/yz_anti_entropy/913438523331814323877303020447676887284957839360", > <<>>,incremental,[],0, > {array,38837,0,...}}}, > {{890602560248518965780370444936484965102833893376,3}, > {state,<<156,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...>>, > > 913438523331814323877303020447676887284957839360,3,1048576, > 1024,0, > {dict,0,16,16,8,80,48,{[],...},{...}}, > <<>>, > > "./data/yz_anti_entropy/913438523331814323877303020447676887284957839360", > <<>>,incremental,[],0, > {array,38837,...}}}, > {{913438523331814323877303020447676887284957839360,3}, > {state,<<160,0,0,0,0,0,0,0,0,0,0,0,0,0,...>>, > > 913438523331814323877303020447676887284957839360,3,1048576, > 1024,0, > {dict,0,16,16,8,80,48,{...},...}, > <<>>, > > "./data/yz_anti_entropy/913438523331814323877303020447676887284957839360", > <<>>,incremental,[],0, > {array,...}}}], > closed = false} > > You can convert the timestamp to local time via: > > (dev1@127.0.0.1)8> calendar:now_to_local_time({1459,801655,506719}). > {{2016,4,4},{16,27,35}} > > > Again, this is just an example, but with the right erlang incantations, > you should be able to iterate over all the timestamps across all the nodes. > > Let us know if that is helpful, or if you need more examples so you can do > it in one swipe. > > -Fred > > On Apr 5, 2016, at 9:29 AM, Oleksiy Krivoshey <oleks...@gmail.com> wrote: > > How can I check that AAE trees have expired? Yesterday I ran " > riak_core_util:rpc_every_member_ann(yz_entropy_mgr, expire_trees, [], > 5000)." on each node (just to be sure). Still today I see that on 3 nodes > (of 5) all entropy tress and all last AAE exchanges are older than 20 days. > > On 4 April 2016 at 17:15, Oleksiy Krivoshey <oleks...@gmail.com> wrote: > >> Continuation... >> >> The new index has the same inconsistent search results problem. >> I was making a snapshot of `search aae-status` command almost each day. >> There were absolutely no Yokozuna errors in logs. >> >> I can see that some AAE trees were not expired (built > 20 days ago). I >> can also see that on two nodes (of 5) last AAE exchanges happened > 20 days >> ago. >> >> For now I have issued >> ` riak_core_util:rpc_every_member_ann(yz_entropy_mgr, expire_trees, [], >> 5000).` on each node again. I will wait 10 days more but I don't think that >> will fix anything. >> >> >> On 25 March 2016 at 09:28, Oleksiy Krivoshey <oleks...@gmail.com> wrote: >> >>> One interesting moment happened when I tried removing the index: >>> >>> - this index was associated with a bucket type, called fs_chunks >>> - so I first called RpbSetBucketTypeReq to set search_index: _dont_index_ >>> - i then tried to remove the index with RpbYokozunaIndexDeleteReq which >>> failed with "index is in use" and list of all buckets of the fs_chunks type >>> - for some reason all these buckets had their own search_index property >>> set to that same index >>> >>> How can this happen if I definitely never set the search_index property >>> per bucket? >>> >>> On 24 March 2016 at 22:41, Oleksiy Krivoshey <oleks...@gmail.com> wrote: >>> >>>> OK! >>>> >>>> On 24 March 2016 at 21:11, Magnus Kessler <mkess...@basho.com> wrote: >>>> >>>>> Hi Oleksiy, >>>>> >>>>> On 24 March 2016 at 14:55, Oleksiy Krivoshey <oleks...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi Magnus, >>>>>> >>>>>> Thanks! I guess I will go with index deletion because I've already >>>>>> tried expiring the trees before. >>>>>> >>>>>> Do I need to delete AAE data somehow or removing the index is enough? >>>>>> >>>>> >>>>> If you expire the AAE trees with the commands I posted earlier, there >>>>> should be no need to remove the AAE data directories manually. >>>>> >>>>> I hope this works for you. Please monitor the tree rebuild and >>>>> exchanges with `riak-admin search aae-status` for the next few days. In >>>>> particular the exchanges should be ongoing on a continuous basis once all >>>>> trees have been rebuilt. If they don't, please let me know. At that point >>>>> you should also gather `riak-debug` output from all nodes before it gets >>>>> rotated out after 5 days by default. >>>>> >>>>> Kind Regards, >>>>> >>>>> Magnus >>>>> >>>>> >>>>>> >>>>>> On 24 March 2016 at 13:28, Magnus Kessler <mkess...@basho.com> wrote: >>>>>> >>>>>>> Hi Oleksiy, >>>>>>> >>>>>>> As a first step, I suggest to simply expire the Yokozuna AAE trees >>>>>>> again if the output of `riak-admin search aae-status` still suggests >>>>>>> that >>>>>>> no recent exchanges have taken place. To do this, run `riak attach` on >>>>>>> one >>>>>>> node and then >>>>>>> >>>>>>> riak_core_util:rpc_every_member_ann(yz_entropy_mgr, expire_trees, [], >>>>>>> 5000). >>>>>>> >>>>>>> >>>>>>> Exit from the riak console with `Ctrl+G q`. >>>>>>> >>>>>>> Depending on your settings and amount of data the full index should >>>>>>> be rebuilt within the next 2.5 days (for a cluster with ring size 128 >>>>>>> and >>>>>>> default settings). You can monitor the progress with `riak-admin search >>>>>>> aae-status` and also in the logs, which should have messages along the >>>>>>> lines of >>>>>>> >>>>>>> 2016-03-24 10:28:25.372 [info] >>>>>>> <0.4647.6477>@yz_exchange_fsm:key_exchange:179 Repaired 83055 keys >>>>>>> during >>>>>>> active anti-entropy exchange of partition >>>>>>> 1210306043414653979137426502093171875652569137152 for preflist >>>>>>> {1164634117248063262943561351070788031288321245184,3} >>>>>>> >>>>>>> >>>>>>> Re-indexing can put additional strain on the cluster and may cause >>>>>>> elevated latency on a cluster already under heavy load. Please monitor >>>>>>> the >>>>>>> response times while the cluster is re-indexing data. >>>>>>> >>>>>>> If the cluster load allows it, you can force more rapid re-indexing >>>>>>> by changing a few parameters. Again at the `riak attach` console, run >>>>>>> >>>>>>> riak_core_util:rpc_every_member_ann(application, set_env, [yokozuna, >>>>>>> anti_entropy_build_limit, {4, 60000}], 5000). >>>>>>> riak_core_util:rpc_every_member_ann(application, set_env, [yokozuna, >>>>>>> anti_entropy_concurrency, 5], 5000). >>>>>>> >>>>>>> This will allow up to 4 trees per node to be built/exchanged per >>>>>>> hour, with up to 5 concurrent exchanges throughout the cluster. To >>>>>>> return >>>>>>> back to the default settings, use >>>>>>> >>>>>>> riak_core_util:rpc_every_member_ann(application, set_env, [yokozuna, >>>>>>> anti_entropy_build_limit, {1, 360000}], 5000). >>>>>>> riak_core_util:rpc_every_member_ann(application, set_env, [yokozuna, >>>>>>> anti_entropy_concurrency, 2], 5000). >>>>>>> >>>>>>> >>>>>>> If the cluster still doesn't make any progress with automatically >>>>>>> re-indexing data, the next steps are pretty much what you already >>>>>>> suggested, to drop the existing index and re-index from scratch. I'm >>>>>>> assuming that losing the indexes temporarily is acceptable to you at >>>>>>> this >>>>>>> point. >>>>>>> >>>>>>> Using any client API that supports RpbYokozunaIndexDeleteReq, you >>>>>>> can drop the index from all Solr instances, losing any data stored there >>>>>>> immediately. Next, you'll have to re-create the index. I have tried this >>>>>>> with the python API, where I deleted the index and re-created it with >>>>>>> the >>>>>>> same already uploaded schema: >>>>>>> >>>>>>> from riak import RiakClient >>>>>>> >>>>>>> c = RiakClient() >>>>>>> c.delete_search_index('my_index') >>>>>>> c.create_search_index('my_index', 'my_schema') >>>>>>> >>>>>>> Note that simply deleting the index does not remove it's existing >>>>>>> association with any bucket or bucket type. Any PUT operations on these >>>>>>> buckets will lead to indexing failures being logged until the index has >>>>>>> been recreated. However, this also means that no separate operation in >>>>>>> `riak-admin` is required to associate the newly recreated index with the >>>>>>> buckets again. >>>>>>> >>>>>>> After recreating the index expire the trees as explained previously. >>>>>>> >>>>>>> Let us know if this solves your issue. >>>>>>> >>>>>>> Kind Regards, >>>>>>> >>>>>>> Magnus >>>>>>> >>>>>>> >>>>>>> On 24 March 2016 at 08:44, Oleksiy Krivoshey <oleks...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> This is how things are looking after two weeks: >>>>>>>> >>>>>>>> - there are no solr indexing issues for a long period (2 weeks) >>>>>>>> - there are no yokozuna errors at all for 2 weeks >>>>>>>> - there is an index with all empty schema, just _yz_* fields, >>>>>>>> objects stored in a bucket(s) are binary and so are not analysed by >>>>>>>> yokozuna >>>>>>>> - same yokozuna query repeated gives different number >>>>>>>> for num_found, typically the difference between real number of keys in >>>>>>>> a >>>>>>>> bucket and num_found is about 25% >>>>>>>> - number of keys repaired by AAE (according to logs) is about 1-2 >>>>>>>> per few hours (number of keys "missing" in index is close to 1,000,000) >>>>>>>> >>>>>>>> Should I now try to delete the index and yokozuna AAE data and wait >>>>>>>> another 2 weeks? If yes - how should I delete the index and AAE data? >>>>>>>> Will RpbYokozunaIndexDeleteReq be enough? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> -- >>>>>>> Magnus Kessler >>>>>>> Client Services Engineer >>>>>>> Basho Technologies Limited >>>>>>> >>>>>>> Registered Office - 8 Lincoln’s Inn Fields London WC2A 3BP Reg >>>>>>> 07970431 >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Magnus Kessler >>>>> Client Services Engineer >>>>> Basho Technologies Limited >>>>> >>>>> Registered Office - 8 Lincoln’s Inn Fields London WC2A 3BP Reg 07970431 >>>>> >>>> >>>> >>> >> > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > >
search-aae.tar.gz
Description: GNU Zip compressed data
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com