Dead node not leaving with force-remove
We have a 1.0.2 cluster with a node that's gone but still listed in member_status (as legacy): riak-admin member_status Attempting to restart script through sudo -u riak = Membership == Status RingPendingNode --- (legacy)7.8% -- 'riak@10.115.13.51' valid 7.8% -- 'riak@10.119.82.164' valid 7.8% -- 'riak@10.13.22.183' valid 7.8% -- 'riak@10.13.51.171' valid 7.8% -- 'riak@10.37.46.7' valid 7.6% -- 'riak@10.76.45.111' valid 7.6% -- 'riak@10.76.62.122' valid 7.6% -- 'riak@10.78.197.82' valid 7.6% -- 'riak@10.79.69.234' valid 7.6% -- 'riak@10.80.157.112' valid 7.6% -- 'riak@10.82.155.202' valid 7.6% -- 'riak@10.82.25.84' valid 7.6% -- 'riak@10.84.5.68' --- Valid:13 / Leaving:0 / Exiting:0 / Joining:0 / Down:0 riak-admin force-remove does not remove the node (we've tried a few times over the last 4 days). This node was down before we did the upgrade, but wasn't removing so we upgraded anyway. As you can see here, doing a force-remove reports successful, but the node is still listed: riak-admin force-remove 'riak@10.115.13.51' Attempting to restart script through sudo -u riak Success: "riak@10.115.13.51" removed from the cluster riak-admin member_status Attempting to restart script through sudo -u riak = Membership == Status RingPendingNode --- (legacy)7.8% -- 'riak@10.115.13.51' valid 7.8% -- 'riak@10.119.82.164' valid 7.8% -- 'riak@10.13.22.183' valid 7.8% -- 'riak@10.13.51.171' valid 7.8% -- 'riak@10.37.46.7' valid 7.6% -- 'riak@10.76.45.111' valid 7.6% -- 'riak@10.76.62.122' valid 7.6% -- 'riak@10.78.197.82' valid 7.6% -- 'riak@10.79.69.234' valid 7.6% -- 'riak@10.80.157.112' valid 7.6% -- 'riak@10.82.155.202' valid 7.6% -- 'riak@10.82.25.84' valid 7.6% -- 'riak@10.84.5.68' --- Valid:13 / Leaving:0 / Exiting:0 / Joining:0 / Down:0 After this, a small number of handoffs are seen in the logs: 17:13:29.514 [info] Handoff of partition riak_kv_vnode 1033327329519114953886199041881434478741108555776 from 'riak@10.13.51.171' to 'riak@10.82.25.84' completed: sent 2 objects in 0.04 seconds 17:13:39.470 [info] Starting handoff of partition riak_kv_vnode 176978713895539025251227460211737396911460581376 from 'riak@10.13.51.171' to 'riak@10.82.155.202' 17:13:39.480 [info] Starting handoff of partition riak_kv_vnode 1238850997268773176758592221482161778380224069632 from 'riak@10.13.51.171' to 'riak@10.76.45.111' 17:13:39.512 [info] Handoff of partition riak_kv_vnode 176978713895539025251227460211737396911460581376 from 'riak@10.13.51.171' to 'riak@10.82.155.202' completed: sent 1 objects in 0.04 seconds 17:13:39.525 [info] Handoff of partition riak_kv_vnode 1238850997268773176758592221482161778380224069632 from 'riak@10.13.51.171' to 'riak@10.76.45.111' completed: sent 3 objects in 0.04 seconds Here's the pending transfer list: riak-admin transfers Nodes ['riak@10.115.13.51'] are currently down. 'riak@10.84.5.68' waiting to handoff 28 partitions 'riak@10.82.25.84' waiting to handoff 40 partitions 'riak@10.82.155.202' waiting to handoff 40 partitions 'riak@10.80.157.112' waiting to handoff 40 partitions 'riak@10.79.69.234' waiting to handoff 40 partitions 'riak@10.78.197.82' waiting to handoff 39 partitions 'riak@10.76.62.122' waiting to handoff 40 partitions 'riak@10.76.45.111' waiting to handoff 40 partitions 'riak@10.37.46.7' waiting to handoff 40 partitions 'riak@10.13.51.171' waiting to handoff 40 partitions 'riak@10.13.22.183' waiting to handoff 40 partitions 'riak@10.119.82.164' waiting to handoff 40 partitions Any ideas on how to get the cluster to remove this node? Thanks, Paul ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Dead node not leaving with force-remove
At 2011-11-28T17:19+, Paul Armstrong wrote: > We have a 1.0.2 cluster with a node that's gone but still listed in > member_status (as legacy): > > riak-admin member_status > Attempting to restart script through sudo -u riak > = Membership > == > Status RingPendingNode > --- > (legacy)7.8% -- 'riak@10.115.13.51' > valid 7.8% -- 'riak@10.119.82.164' > valid 7.8% -- 'riak@10.13.22.183' > valid 7.8% -- 'riak@10.13.51.171' > valid 7.8% -- 'riak@10.37.46.7' > valid 7.6% -- 'riak@10.76.45.111' > valid 7.6% -- 'riak@10.76.62.122' > valid 7.6% -- 'riak@10.78.197.82' > valid 7.6% -- 'riak@10.79.69.234' > valid 7.6% -- 'riak@10.80.157.112' > valid 7.6% -- 'riak@10.82.155.202' > valid 7.6% -- 'riak@10.82.25.84' > valid 7.6% -- 'riak@10.84.5.68' > --- > Valid:13 / Leaving:0 / Exiting:0 / Joining:0 / Down:0 Many thanks to the Basho team (Dan Reverri and Mark Phillips in particular) for helping to solve this. The ring was corrupted and there was an interesting bug around legacy gossip and forced removal (see Bug 1298: Legacy gossip / force-remove troubles ---> https://issues.basho.com/show_bug.cgi?id=1298 ) After some erlang console work, our ring no longer had the ghost hosts in it, was able to settle into the new gossip mode and we were able to shrink it. Paul ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak fails to start on ubuntu 11.10
At 2011-12-21T09:49+0800, Zheng Zhibin wrote: > try to set permission of that folder as 777, I have also encountered > this kind of error, it is due to user of starting riak changed. If the user changed, change the user and/or group on the directory with chown. Using permissions 777 is incredibly insecure and should be avoided. Paul ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: SSDs instead of SAS.
At 2012-01-09T15:12-0800, Jeremiah Peschka wrote: >However, make sure you do the reading on the SSDs you're going to >purchase because not all SSDs are created equal. I had a client buy >some smaller OCZ-Vertex 3s recently which have a wear leveling issue >(the 120GB drives have fewer chips than the 240GB drives) that causes >performance to fall apart pretty quickly. In general, for production you want to use SLC or eMLC drives (which are quite a bit more expensive than MLC but also have fewer degredation issues). Another way you can improve your life significantly I/O wise is to use an OS with ZFS (such as SmartOS, IllumOS or Solaris) as these can be configured to use the SSDs as a large cache for reads and write backs to slower I/O such as SAS or SATA. Paul ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Issues regarding the installation of Riak in EC2
At 2012-01-31T17:53-0500, Jeff Kirkell wrote: >You also need a security rule to allow the handoff port between the >two instances and not the outside world. I apologies for not recalling >how to do off the top of my head but as Carl said, it is in the >security firewall rules on the EC2 dashboard. AWS Management Console -> EC2 -> Network & Security -> Security Groups Select group -> Inbound -> Add port (e.g. 8098) -> Add source -> Add Rule Note that the source can also be the name of a security group, so if all your riak hosts have the "riak" security group you can use that instead of an IP address. Click "Apply Rule Changes" when you've done all of them. In general, it's better to use the CLI tool chains (either the Amazon one or the Boto one), but this will get things going. Paul ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: A couple of questions about Riak
On 16/02/2012, at 1:18, Aphyr wrote: > On 02/16/2012 01:07 AM, Jerome Renard wrote: >> - Will it be a problem if I decide to run Riak on ZFS + compression enabled ? > I suspect it would work quite well. If you try it, please report back! It does work very well. Make sure you use SSD for your ZIL even if you're not using it elsewhere. Obviously, you'll want extra CPU lying around for the compression... Paul ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: A couple of questions about Riak
On 16/02/2012, at 1:18, Aphyr wrote: > On 02/16/2012 01:07 AM, Jerome Renard wrote: >> - Will it be a problem if I decide to run Riak on ZFS + compression enabled ? > > I suspect it would work quite well. If you try it, please report back! It does work very well. Make sure you use SSD for your ZIL even if you're not using it elsewhere. Obviously, you'll want extra CPU lying around for the compression... Paul ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com