Hello all, We are hitting an issue with a riak 1.0.3 cluster when adding new nodes to the ring. Specifically the handoff appears stuck and isn't making any progress.
I have read a number of the threads on here and realize handoff will take a while, and have also tried attaching to the console and doing a force_update along w/ force_handoffs. However over 12 hours later the nodes haven't made any progress. After digging through the log files it appears that the search merge_index could be my problem? Possibly the compaction isn't occurring properly? We are running a riak 1.0.3 cluster for a research project, where we are utilizing the python client for reads, writes, and queries of the cluster. Using a small data set of 20k keys things were humming along nicely. We then started to ramp up the number of objects and ended up getting to around 1M objects. At this same time I added an additional node (w/ plans to expand to 8 nodes total). However it appears that the partition handoff is stuck after performing the 'join' command on the 5th node I was adding. So currently it is a 4 + 1 node cluster with 4 gig of memory per node, am running the bitcask backend with 'search' enabled on some of the buckets. Specifically I am using the 'out of the box' JSON encoding schema by simply setting the mime-type to "application/json", when I do the store from the python client. I'm wondering if enabling search and using the default JSON schema was too much data to index? Outside of increasing the linux file limit on the nodes, enabling 'search' (in the config file and w/ the pre-commit hook), and upping the ring_creation_size to 256 (before I started or added any nodes) there shouldn't be much else out of the ordinary going on. This was an original 1.0 riak cluster which I have been performing rolling upgrades on as the bug fix versions come out. However currently all 4 + 1 nodes are 1.0.3 Here are the *I hope* relevant error logs? Riak error log: http://pastebin.com/99cdPdCk Riak crash log: http://pastebin.com/07FRZkf2 Riak erlang log: http://pastebin.com/DvdasWyR Does anyone have any ideas on how to 'unstick' the partition handoff? Or maybe the bigger question is indexing all of the incoming data (outside of the disk space requirements) a bad idea? Perhaps I need to write a custom schema that limits what gets indexed? I should mention that the search is a 'nice-to-have' but the data is structured in a way that we know the keys we need at lookup time (for the most part) and I can probably use m/r to query the restĀ With that I'm wondering if it comes down to it can search be easily 'undone' on the cluster? Maybe as simply as disabling the pre-commit hook, turning it off in the app.config and them deleting the riak/merge_index directories on each node? Thanks, ryan
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com