Re: v1.0.3 search merge index preventing partition handoff?

Kresten Krab Thorup Wed, 25 Jan 2012 14:02:40 -0800

Hi Ryan,

>From the looks of the crash log, it seems that one of your merge index files 
>may be corrupt (did you run out of disk space, or crash a node?)


At any rate, what seems to be happening is that the search vnode is in the 
middle of a handoff (presumably to the new machine), and while it is doing a 
full scan of the merge index segment files to transfer data, it encounters a 
bad file.  This results in the crash log is that it tries to do 
binary_to_term(<<131,109,0,0,128,40,...>>) on a 46-byte binary; but the encoded 
stream says that the data length should be 128*16+40 bytes, i.e. 2088 bytes 
long.

So, something is too short, which I would guess could have happened because 
either the server crashed or ran out of disk.  From a casual inspection of the 
code, it doesn't look like merge indexes are resilient to a node crashing while 
it is writing to disk.

I don't know search intimately, but I have seen mention of problems before that 
were caused by "bad indexes", and the resolution seems to be to delete the 
merge index files (the search index in your case 
/var/lib/riak/merge_index/159851741583067506678528028578343455274867621888), 
and then iterate over all values and re-write them.  Bummer.

Perhaps someone from Basho can chime in and tell us (A) if it seems plausible 
that the merge index segment files are indeed corrupt, and (B) if so, what is 
the right way to recover from that.

Kresten





On Jan 25, 2012, at 9:18 PM, Fisher, Ryan wrote:

Hello all,

We are hitting an issue with a riak 1.0.3 cluster when adding new nodes to the 
ring.  Specifically the handoff appears stuck and isn't making any progress.

I have read a number of the threads on here and realize handoff will take a 
while, and have also tried attaching to the console and doing a force_update 
along w/ force_handoffs.  However over 12 hours later the nodes haven't made 
any progress.  After digging through the log files it appears that the search 
merge_index could be my problem?  Possibly the compaction isn't occurring 
properly?

We are running a riak 1.0.3 cluster for a research project, where we are 
utilizing the python client for reads, writes, and queries of the cluster.  
Using a small data set of 20k keys things were humming along nicely.

We then started to ramp up the number of objects and ended up getting to around 
1M objects.  At this same time I added an additional node (w/ plans to expand 
to 8 nodes total).

However it appears that the partition handoff is stuck after performing the 
'join' command on the 5th node I was adding.

So currently it is a 4 + 1 node cluster with 4 gig of memory per node, am 
running the bitcask backend with 'search' enabled on some of the buckets.  
Specifically I am using the 'out of the box' JSON encoding schema by simply 
setting the mime-type to "application/json", when I do the store from the 
python client.

I'm wondering if enabling search and using the default JSON schema was too much 
data to index?  Outside of increasing the linux file limit on the nodes, 
enabling 'search' (in the config file and w/ the pre-commit hook), and upping 
the ring_creation_size to 256 (before I started or added any nodes) there 
shouldn't be much else out of the ordinary going on.  This was an original 1.0 
riak cluster which I have been performing rolling upgrades on as the bug fix 
versions come out.  However currently all 4 + 1 nodes are 1.0.3

Here are the *I hope* relevant error logs?

Riak error log:
http://pastebin.com/99cdPdCk

Riak crash log:
http://pastebin.com/07FRZkf2

Riak erlang log:
http://pastebin.com/DvdasWyR

Does anyone have any ideas on how to 'unstick' the partition handoff?  Or maybe 
the bigger question is indexing all of the incoming data (outside of the disk 
space requirements) a bad idea?  Perhaps I need to write a custom schema that 
limits what gets indexed?

I should mention that the search is a 'nice-to-have' but the data is structured 
in a way that we know the keys we need at lookup time (for the most part) and I 
can probably use m/r to query the rest… With that I'm wondering if it comes 
down to it can search be easily 'undone' on the cluster?  Maybe as simply as 
disabling the pre-commit hook, turning it off in the app.config and them 
deleting the riak/merge_index directories on each node?


Thanks,
ryan




<smime.p7s>_______________________________________________
riak-users mailing list
riak-users@lists.basho.com<mailto:riak-users@lists.basho.com>
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



Mobile: + 45 2343 4626 | Skype: krestenkrabthorup | Twitter: @drkrab
Trifork A/S  |  Margrethepladsen 4  | DK- 8000 Aarhus C |  Phone : +45 8732 
8787  |  www.trifork.com<http://www.trifork.com>





_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: v1.0.3 search merge index preventing partition handoff?

Reply via email to