Last night we did two things. First we upgraded our entire cluster from riak-search 0.14.2 to 1.0.1. This process went pretty well and the cluster was responding correctly after this was completed.
In our cluster we have around 40 000 files stored in Luwak (we also have about the same amount of keys, or more, in riak which is mostly the metadata for the files in Luwak). The files are in sizes ranging from around 50K to around 400MB, most of the files are pretty small though. I think we're up to a total of around 30GB now. Anyway, upon adding a new node to the now 1.0.1 cluster I saw the beam.smp processes on all the servers, including the new one, taking up almost all available cpu. It stayed in this state for around an hour and the cluster was slow to respond and occasionally timed out. During the process Riak crashed on random nodes from time to time and I had to restart it. After about an hour things settled down. I added this new node to our load-balancer so it too could serve requests. When testing our apps against the cluster we still got lots of timeouts and something seemed very very wrong. After a while I did a "riak-admin leave" on the node that was added (kind of a panic move I guess). Around 20 minutes after I did this, the cluster started responding correctly again. All was not well though - files seemed to be corrupted(not sure what percentage but could be 1 % or more). I have no idea how that could happen but files that we had accessed before now contained garbage. I haven't thoroughly researched exactly WHAT garbage they contain but they're not in a usable state anymore. Is this something that could happen under any circumstances in Riak? I'm afraid of adding a node at all now since it resulted in downtime and corruption when I tried it. I checked and rechecked the configuration files and really - they're the same on all the nodes (except for vm.args where they have different names of course). Has anyone ever seen anything like this? Could it somehow be related to the fact that I did an upgrade from 0.14.2 to 1.0.1 and maybe an hour later added a new 1.0.1 node? Thanks for any input! John _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com