Dead node not leaving with force-remove

2011-11-28 Thread Paul Armstrong
We have a 1.0.2 cluster with a node that's gone but still listed in
member_status (as legacy):

riak-admin member_status
Attempting to restart script through sudo -u riak
= Membership
==
Status RingPendingNode
---
(legacy)7.8%  --  'riak@10.115.13.51'
valid   7.8%  --  'riak@10.119.82.164'
valid   7.8%  --  'riak@10.13.22.183'
valid   7.8%  --  'riak@10.13.51.171'
valid   7.8%  --  'riak@10.37.46.7'
valid   7.6%  --  'riak@10.76.45.111'
valid   7.6%  --  'riak@10.76.62.122'
valid   7.6%  --  'riak@10.78.197.82'
valid   7.6%  --  'riak@10.79.69.234'
valid   7.6%  --  'riak@10.80.157.112'
valid   7.6%  --  'riak@10.82.155.202'
valid   7.6%  --  'riak@10.82.25.84'
valid   7.6%  --  'riak@10.84.5.68'
---
Valid:13 / Leaving:0 / Exiting:0 / Joining:0 / Down:0


riak-admin force-remove does not remove the node (we've tried a few
times over the last 4 days).

This node was down before we did the upgrade, but wasn't removing so we
upgraded anyway. As you can see here, doing a force-remove reports
successful, but the node is still listed:

riak-admin force-remove 'riak@10.115.13.51'
Attempting to restart script through sudo -u riak
Success: "riak@10.115.13.51" removed from the cluster

riak-admin member_status
Attempting to restart script through sudo -u riak
= Membership
==
Status RingPendingNode
---
(legacy)7.8%  --  'riak@10.115.13.51'
valid   7.8%  --  'riak@10.119.82.164'
valid   7.8%  --  'riak@10.13.22.183'
valid   7.8%  --  'riak@10.13.51.171'
valid   7.8%  --  'riak@10.37.46.7'
valid   7.6%  --  'riak@10.76.45.111'
valid   7.6%  --  'riak@10.76.62.122'
valid   7.6%  --  'riak@10.78.197.82'
valid   7.6%  --  'riak@10.79.69.234'
valid   7.6%  --  'riak@10.80.157.112'
valid   7.6%  --  'riak@10.82.155.202'
valid   7.6%  --  'riak@10.82.25.84'
valid   7.6%  --  'riak@10.84.5.68'
---
Valid:13 / Leaving:0 / Exiting:0 / Joining:0 / Down:0

After this, a small number of handoffs are seen in the logs:

17:13:29.514 [info] Handoff of partition riak_kv_vnode
1033327329519114953886199041881434478741108555776 from
'riak@10.13.51.171' to 'riak@10.82.25.84' completed: sent 2 objects in
0.04 seconds
17:13:39.470 [info] Starting handoff of partition riak_kv_vnode
176978713895539025251227460211737396911460581376 from
'riak@10.13.51.171' to 'riak@10.82.155.202'
17:13:39.480 [info] Starting handoff of partition riak_kv_vnode
1238850997268773176758592221482161778380224069632 from
'riak@10.13.51.171' to 'riak@10.76.45.111'
17:13:39.512 [info] Handoff of partition riak_kv_vnode
176978713895539025251227460211737396911460581376 from
'riak@10.13.51.171' to 'riak@10.82.155.202' completed: sent 1 objects in
0.04 seconds
17:13:39.525 [info] Handoff of partition riak_kv_vnode
1238850997268773176758592221482161778380224069632 from
'riak@10.13.51.171' to 'riak@10.76.45.111' completed: sent 3 objects in
0.04 seconds

Here's the pending transfer list:

riak-admin transfers
Nodes ['riak@10.115.13.51'] are currently down.
'riak@10.84.5.68' waiting to handoff 28 partitions
'riak@10.82.25.84' waiting to handoff 40 partitions
'riak@10.82.155.202' waiting to handoff 40 partitions
'riak@10.80.157.112' waiting to handoff 40 partitions
'riak@10.79.69.234' waiting to handoff 40 partitions
'riak@10.78.197.82' waiting to handoff 39 partitions
'riak@10.76.62.122' waiting to handoff 40 partitions
'riak@10.76.45.111' waiting to handoff 40 partitions
'riak@10.37.46.7' waiting to handoff 40 partitions
'riak@10.13.51.171' waiting to handoff 40 partitions
'riak@10.13.22.183' waiting to handoff 40 partitions
'riak@10.119.82.164' waiting to handoff 40 partitions

Any ideas on how to get the cluster to remove this node?

Thanks,
Paul

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Dead node not leaving with force-remove

2011-12-05 Thread Paul Armstrong
At 2011-11-28T17:19+, Paul Armstrong wrote:
> We have a 1.0.2 cluster with a node that's gone but still listed in
> member_status (as legacy):
> 
> riak-admin member_status
> Attempting to restart script through sudo -u riak
> = Membership
> ==
> Status RingPendingNode
> ---
> (legacy)7.8%  --  'riak@10.115.13.51'
> valid   7.8%  --  'riak@10.119.82.164'
> valid   7.8%  --  'riak@10.13.22.183'
> valid   7.8%  --  'riak@10.13.51.171'
> valid   7.8%  --  'riak@10.37.46.7'
> valid   7.6%  --  'riak@10.76.45.111'
> valid   7.6%  --  'riak@10.76.62.122'
> valid   7.6%  --  'riak@10.78.197.82'
> valid   7.6%  --  'riak@10.79.69.234'
> valid   7.6%  --  'riak@10.80.157.112'
> valid   7.6%  --  'riak@10.82.155.202'
> valid   7.6%  --  'riak@10.82.25.84'
> valid   7.6%  --  'riak@10.84.5.68'
> ---
> Valid:13 / Leaving:0 / Exiting:0 / Joining:0 / Down:0

Many thanks to the Basho team (Dan Reverri and Mark Phillips in
particular) for helping to solve this. The ring was corrupted and there
was an interesting bug around legacy gossip and forced removal (see
Bug 1298: Legacy gossip / force-remove troubles --->
https://issues.basho.com/show_bug.cgi?id=1298 )

After some erlang console work, our ring no longer had the ghost hosts
in it, was able to settle into the new gossip mode and we were able to
shrink it.

Paul

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak fails to start on ubuntu 11.10

2011-12-20 Thread Paul Armstrong
At 2011-12-21T09:49+0800, Zheng Zhibin wrote:
> try to set permission of that folder as 777,  I have also encountered
> this kind of error, it is due to user of starting riak changed.

If the user changed, change the user and/or group on the directory with chown.
Using permissions 777 is incredibly insecure and should be avoided.

Paul

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: SSDs instead of SAS.

2012-01-09 Thread Paul Armstrong
At 2012-01-09T15:12-0800, Jeremiah Peschka wrote:
>However, make sure you do the reading on the SSDs you're going to
>purchase because not all SSDs are created equal. I had a client buy
>some smaller OCZ-Vertex 3s recently which have a wear leveling issue
>(the 120GB drives have fewer chips than the 240GB drives) that causes
>performance to fall apart pretty quickly.

In general, for production you want to use SLC or eMLC drives (which are
quite a bit more expensive than MLC but also have fewer degredation
issues).

Another way you can improve your life significantly I/O wise is to use
an OS with ZFS (such as SmartOS, IllumOS or Solaris) as these can be
configured to use the SSDs as a large cache for reads and write backs to
slower I/O such as SAS or SATA.

Paul

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Issues regarding the installation of Riak in EC2

2012-01-31 Thread Paul Armstrong
At 2012-01-31T17:53-0500, Jeff Kirkell wrote:
>You also need a security rule to allow the handoff port between the
>two instances and not the outside world. I apologies for not recalling
>how to do off the top of my head but as Carl said, it is in the
>security firewall rules on the EC2 dashboard.

AWS Management Console -> EC2 -> Network & Security -> Security Groups

Select group -> Inbound -> Add port (e.g. 8098) -> Add source -> Add Rule
Note that the source can also be the name of a security group, so if all
your riak hosts have the "riak" security group you can use that instead
of an IP address.

Click "Apply Rule Changes" when you've done all of them.

In general, it's better to use the CLI tool chains (either the Amazon
one or the Boto one), but this will get things going.

Paul

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: A couple of questions about Riak

2012-02-16 Thread Paul Armstrong
On 16/02/2012, at 1:18, Aphyr  wrote:
> On 02/16/2012 01:07 AM, Jerome Renard wrote:
>> - Will it be a problem if I decide to run Riak on ZFS + compression enabled ?
> I suspect it would work quite well. If you try it, please report back!

It does work very well. Make sure you use SSD for your ZIL even if
you're not using it elsewhere. Obviously, you'll want extra CPU lying
around for the compression...

Paul

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: A couple of questions about Riak

2012-02-17 Thread Paul Armstrong
On 16/02/2012, at 1:18, Aphyr  wrote:
> On 02/16/2012 01:07 AM, Jerome Renard wrote:
>> - Will it be a problem if I decide to run Riak on ZFS + compression enabled ?
> 
> I suspect it would work quite well. If you try it, please report back!

It does work very well. Make sure you use SSD for your ZIL even if you're not 
using it elsewhere. Obviously, you'll want extra CPU lying around for the 
compression...

Paul
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com