I would like to continue as this seems to me like a serious problem, on a bucket with 700,000 keys the difference in num_found can be up to 200,000! And thats a search index that doesn't index, analyse or store ANY of the document fields, the schema has only required _yz_* fields and nothing else.
I have tried deleting the search index (with PBC call) and tried expiring AAE trees. Nothing helps. I can't get consistent search results from Yokozuna. Please help. On 11 March 2016 at 18:18, Oleksiy Krivoshey <oleks...@gmail.com> wrote: > Hi Fred, > > This is production environment but I can delete the index. However this > index covers ~3500 buckets and there are probably 10,000,000 keys. > > The index was created after the buckets. The schema for the index is just > the basic required fields (_yz_*) and nothing else. > > Yes, I'm willing to resolve this. When you say to delete chunks_index, do > you mean the simple RpbYokozunaIndexDeleteReq or something else is required? > > Thanks! > > > > > On 11 March 2016 at 17:08, Fred Dushin <fdus...@basho.com> wrote: > >> Hi Oleksiy, >> >> This is definitely pointing to an issue either in the coverage plan >> (which determines the distributed query you are seeing) or in the data you >> have in Solr. I am wondering if it is possible that you have some data in >> Solr that is causing the rebuild of the YZ AAE tree to incorrectly >> represent what is actually stored in Solr. >> >> What you did was to manually expire the YZ (Riak Search) AAE trees, which >> caused them to rebuild from the entropy data stored in Solr. Another thing >> we could try (if you are willing) would be to delete the 'chunks_index' >> data in Solr (as well as the Yokozuna AAE data), and then let AAE repair >> the missing data. What Riak will essentially do is compare the KV hash >> trees with the YZ hash trees (which will be empty), too that it is missing >> in Solr, and add it to Solr, as a result. This would effectively result in >> re-indexing all of your data, but we are only talking about ~30k entries >> (times 3, presumably, if your n_val is 3), so that shouldn't take too much >> time, I wouldn't think. There is even some configuration you can use to >> accelerate this process, if necessary. >> >> Is that something you would be willing to try? It would result in down >> time on query. Is this production data or a test environment? >> >> -Fred >> >> On Mar 11, 2016, at 7:38 AM, Oleksiy Krivoshey <oleks...@gmail.com> >> wrote: >> >> Here are two consequent requests, one returns 30118 keys, another 37134 >> >> <?xml version="1.0" encoding="UTF-8"?> >> <response> >> <lst name="responseHeader"> >> <int name="status">0</int> >> <int name="QTime">6</int> >> <lst name="params"> >> <str name="10.0.1.3:8093">_yz_pn:92 OR _yz_pn:83 OR _yz_pn:71 OR >> _yz_pn:59 OR _yz_pn:50 OR _yz_pn:38 OR _yz_pn:17 OR _yz_pn:5</str> >> <str name="10.0.1.2:8093">_yz_pn:122 OR _yz_pn:110 OR _yz_pn:98 OR >> _yz_pn:86 OR _yz_pn:74 OR _yz_pn:62 OR _yz_pn:26 OR _yz_pn:14 OR >> _yz_pn:2</str> >> <str name="shards"> >> 10.0.1.1:8093/internal_solr/chunks_index,10.0.1.2:8093/internal_solr/chunks_index,10.0.1.3:8093/internal_solr/chunks_index,10.0.1.4:8093/internal_solr/chunks_index,10.0.1.5:8093/internal_solr/chunks_index >> </str> >> <str name="q">_yz_rb:0dmid2ilpyrfiuaqtvnc482f1esdchb5.chunks</str> >> <str name="10.0.1.5:8093">(_yz_pn:124 AND (_yz_fpn:124 OR >> _yz_fpn:123)) OR _yz_pn:116 OR _yz_pn:104 OR _yz_pn:80 OR _yz_pn:68 OR >> _yz_pn:56 OR _yz_pn:44 OR _yz_pn:32 OR _yz_pn:20 OR _yz_pn:8</str> >> <str name="10.0.1.1:8093">_yz_pn:113 OR _yz_pn:101 OR _yz_pn:89 OR >> _yz_pn:77 OR _yz_pn:65 OR _yz_pn:53 OR _yz_pn:41 OR _yz_pn:29</str> >> <str name="10.0.1.4:8093">_yz_pn:127 OR _yz_pn:119 OR _yz_pn:107 >> OR _yz_pn:95 OR _yz_pn:47 OR _yz_pn:35 OR _yz_pn:23 OR _yz_pn:11</str> >> <str name="rows">0</str> >> </lst> >> </lst> >> <result maxScore="6.364349" name="response" numFound="30118" >> start="0"></result> >> </response> >> >> ------ >> >> >> <?xml version="1.0" encoding="UTF-8"?> >> <response> >> <lst name="responseHeader"> >> <int name="status">0</int> >> <int name="QTime">10</int> >> <lst name="params"> >> <str name="10.0.1.3:8093">_yz_pn:100 OR _yz_pn:88 OR _yz_pn:79 OR >> _yz_pn:67 OR _yz_pn:46 OR _yz_pn:34 OR _yz_pn:25 OR _yz_pn:13 OR >> _yz_pn:1</str> >> <str name="10.0.1.2:8093">(_yz_pn:126 AND (_yz_fpn:126 OR >> _yz_fpn:125)) OR _yz_pn:118 OR _yz_pn:106 OR _yz_pn:94 OR _yz_pn:82 OR >> _yz_pn:70 OR _yz_pn:58 OR _yz_pn:22 OR _yz_pn:10</str> >> <str name="shards"> >> 10.0.1.1:8093/internal_solr/chunks_index,10.0.1.2:8093/internal_solr/chunks_index,10.0.1.3:8093/internal_solr/chunks_index,10.0.1.4:8093/internal_solr/chunks_index,10.0.1.5:8093/internal_solr/chunks_index >> </str> >> <str name="q">_yz_rb:0dmid2ilpyrfiuaqtvnc482f1esdchb5.chunks</str> >> <str name="10.0.1.5:8093">_yz_pn:124 OR _yz_pn:112 OR _yz_pn:76 OR >> _yz_pn:64 OR _yz_pn:52 OR _yz_pn:40 OR _yz_pn:28 OR _yz_pn:16 OR >> _yz_pn:4</str> >> <str name="10.0.1.1:8093">_yz_pn:121 OR _yz_pn:109 OR _yz_pn:97 OR >> _yz_pn:85 OR _yz_pn:73 OR _yz_pn:61 OR _yz_pn:49 OR _yz_pn:37</str> >> <str name="10.0.1.4:8093">_yz_pn:115 OR _yz_pn:103 OR _yz_pn:91 OR >> _yz_pn:55 OR _yz_pn:43 OR _yz_pn:31 OR _yz_pn:19 OR _yz_pn:7</str> >> <str name="rows">0</str> >> </lst> >> </lst> >> <result maxScore="6.364349" name="response" numFound="37134" >> start="0"></result> >> </response> >> >> On 11 March 2016 at 12:05, Oleksiy Krivoshey <oleks...@gmail.com> wrote: >> >>> So event when I fixed 3 documents which caused AAE errors, >>> restarted AAE with riak_core_util:rpc_every_member_ann(yz_entropy_mgr, >>> expire_trees, [], 5000). >>> waited 5 days (now I see all AAE trees rebuilt in last 5 days and no AAE >>> or Solr errors), I still get inconsistent num_found. >>> >>> For a bucket with 30,000 keys each new search request can result in >>> difference in num_found for over 5,000. >>> >>> What else can I do to get consistent index, or at least not a 15% >>> difference. >>> >>> I even tried to walk through all the bucket keys and modifying them in a >>> hope that all Yokozuna instances in a cluster will pick them up, but no >>> luck. >>> >>> Thanks! >>> >>> >> >> _______________________________________________ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> >> >> >> _______________________________________________ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> >> >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com