2012/1/13 Ryan Zezeski <rzeze...@basho.com>: > Francisco, > > This is a tricky one indeed. If I'm following your thus far you: > > 1. In November you had an incident with the cluster but re-saved all docs to > repopulate the index, things were good
Correct > > 2. Then in Dec you compared a mapred vs. search query, you didn't get > expected results, re-saved docs again to repopulate search index, reran > mapred and search queries and things were good again. Question a) what > compelled you to do this comparison in the first place? That is, why did > you suspect a discrepancy? Users reporting missing documents, same as in January. > 3. Now in early Jan you had a customer report that they suspected missing > results, sure enough you claim that 5% of the search indexes were missing, > you re-saved docs to repopulate search index, things are good again > > Question b) these missing docs _have not_ been changed between the time you > re-saved in Dec and the time the incident was reported by the user? I'm pretty sure they didn't (especially for the user that complained), but can't put my finger on it. > This is > important because if the document changed then perhaps there is a corner > case in the KV hook that I can search for. If the docs haven't been updated > then I'm a little dumbfounded because it potentially means the index is > losing data. Given that other users of Search haven't reported any similar > issues (that I'm aware of) it seems we have more digging to do. (I don't know how much Search has changed in 1.0.x, but keep in mind I'm running 0.14.2 in production.) I'd love to provide you with more information, but don't have much time to go digging around. The workaround is, well, a workaround, but it let us move on. I'll keep a special eye on this until it happens again, and let's definitely keep each other posted. Thanks, Francisco > On Thu, Jan 12, 2012 at 1:04 PM, francisco treacy > <francisco.tre...@gmail.com> wrote: >> >> Hi Ryan, >> >> I had an incident when removing a replica back in November. But then I >> managed to fix the state of the cluster. Then in December I passed the >> script to re-save all docs so it worked (i.e. documents matched), and >> now in January we were still noticing discrepancies. >> >> I noticed the problem only twice, perhaps it did happen more often. >> Avg load is way under control, no spikes, no faulty disks. The cluster >> hasn't changed at all since November. >> >> Francisco >> >> 2012/1/10 Ryan Zezeski <rzeze...@basho.com>: >> > Typically, something like this would be caused by replicas being lost as >> > Search currently has no anti-entropy. I imagine you would have >> > mentioned >> > replica loss if that was the case, though. You've seen this problem at >> > least twice, correct? Were there other occurrences? If so how many and >> > about how often do you notice them? Did any other system events happen >> > that >> > might be correlated such as spike in load, faulty disks, node >> > join/leave/remove, node down/up, etc. >> > >> > On Tue, Jan 10, 2012 at 11:31 AM, francisco treacy >> > <francisco.tre...@gmail.com> wrote: >> >> >> >> 2012/1/10 Ryan Zezeski <rzeze...@basho.com>: >> >> > When you say "doesn't show up on the search query results *anymore*" >> >> > does >> >> > that imply that at one time they did? >> >> >> >> Absolutely. As a matter of fact that's what some of my users were >> >> asking: "why can't I see X anymore?". >> >> >> >> > I'm trying to understand if the index >> >> > entries appear to have been lost or if it was never successfully >> >> > written >> >> > in >> >> > the first place. >> >> >> >> They are definitely written to the index in the first place. I recall >> >> noticing a similar situation early December, where I compared item >> >> counts from a full-bucket map/reduce vs. a search map/reduce and they >> >> didn't match. >> >> >> >> Went about it by re-saving all items in the bucket as I said before. >> >> Did a check just after, and numbers matched. >> >> >> >> And this morning I was back to the inequality after a user complaint, >> >> so passed the re-save script once again. Obviously this is not >> >> sustainable. >> >> >> >> The curious thing is that those documents are not even regularly >> >> updated or deleted. >> >> >> >> > Any errors you see in the logs may be relevant. Feel free to include >> >> > snippets of anything you find. >> >> >> >> I went through all sasl-error.log from all 3 nodes and couldn't find >> >> anything related to search. Where else could I look? Do you have any >> >> idea what could be causing this behaviour? >> >> >> >> Francisco >> >> >> >> > >> >> > On Tue, Jan 10, 2012 at 4:16 AM, francisco treacy >> >> > <francisco.tre...@gmail.com> wrote: >> >> >> >> >> >> Hi all, >> >> >> >> >> >> I am running (still) 0.14.2 in production, and using Riak Search to >> >> >> index certain buckets. Those buckets have the Riak Search pre-commit >> >> >> hook enabled. >> >> >> >> >> >> Once in a while I get complaints of missing documents from my >> >> >> clients, >> >> >> and it just happened. >> >> >> >> >> >> I checked and the document: >> >> >> - is correctly stored in Riak >> >> >> - doesn't show up on the search query results *anymore* >> >> >> >> >> >> My workaround before was to go through every key in the bucket and >> >> >> save it again, as to trigger the search hook. Of course, this is >> >> >> less >> >> >> than ideal. >> >> >> >> >> >> Should I be looking for something specific in the logs? What could >> >> >> be >> >> >> going on here? >> >> >> >> >> >> Thanks, >> >> >> >> >> >> Francisco >> >> >> >> >> >> _______________________________________________ >> >> >> riak-users mailing list >> >> >> riak-users@lists.basho.com >> >> >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> >> > >> >> > >> > >> > > > _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com