Random server restarts, swap and moving nodes to new hardware

2011-08-16 Thread Jeff Pollard
Hello everyone,

We've got a very interesting problem.  We're hosting our 5-node cluster on EC2
running Ubuntu 10.04 LTS (Lucid Lynx) Server
64-bit using
m2.xlarge instance types, and over the past 5 days we've had two EC2 servers
randomly restart on us.  We've checked the logs and there was nothing that
we saw that indicated why they restarted.  One second they were happily
logging and the next second the server was in the process of rebooting.
 This is particularly bad because every time the node comes back up we get
merge errors due to an existing bug in Riak and have to restore from a
recent backup.

Just today we noticed that the EC2 servers did not have swap enabled
(apparently the norm for xlarge+ instances), which we thought might have
been our problem?  My knowledge of what happens when swap is off is pretty
poor - but I have been told that the Linux OOM killer should still be
invoked and start trying to kill processes, rather than the server simply
restarting.  Is that correct?  Also, how would Riak hypothetically handle
swap being off on a system?  We're using Bitcask if that helps.

Secondly, one of our ops guys here thinks the issue might be related to a
bug  (?) that others
Ubuntu users of the same version seem to have.  In fact, we do see the same
"INFO: task cron:15047 blocked for more than 120 seconds: line in our log
file.  We're also running a AMI that isn't the official one from Canonical,
so the thought being an upgrade to the official AMI would help.

If we do want to upgrade, it will mean moving each cluster node to new
hardware.  I wanted to ask the list to make sure we were doing it correctly.
 Here is the plan to transfer a node to new hardware -- note that these
steps will be done on one node at a time, and we'll make sure the cluster
has stabilized after doing one node before moving on to the next one.

   1. Stop riak on old server.
   2. Copy data directory (including bitcask, mr_queue and ring folders) to
   a shared location.
   3. Shutdown old server.
   4. Boot new replacement server, installing (but not starting) Riak.
   5. Transfer data directory from shared location to data folder on new
   6. Start riak.

My main concern is if the ring state will transfer to a new node safely,
assuming the new server has the same hostname and node name as the old
server?  The new server will have a different IP address, but all our node
names in our cluster use hostnames, and those will not be changing.
riak-users mailing list

Re: Random server restarts, swap and moving nodes to new hardware

2011-08-16 Thread Sean Cribbs

We highly recommend you upgrade to 10.10 or later. 10.04 has some known
problems when running under Xen (especially on EC2) -- in some cases under
load, the network interface will break, making the node temporarily

When you do upgrade, the simplest way (if possible) would be to remount the
attached EBS volumes where your Riak data is stored onto the new nodes.
Otherwise, the steps you list are correct.

Regarding swap, whether you have it on or not is a personal decision. Riak
will "do the right thing" and exit when it can't allocate more memory,
allowing you to figure out what went wrong -- as opposed to grinding the
machine into IO oblivion while consuming more and more swap.  That said, in
some deployments (notably not on EC2), swap can be helpful.

Hope that helps,

Sean Cribbs 
Developer Advocate
Basho Technologies, Inc.

On Tue, Aug 16, 2011 at 3:46 AM, Jeff Pollard wrote:

> Hello everyone,
> We've got a very interesting problem.  We're hosting our 5-node cluster on EC2
> running Ubuntu 10.04 LTS (Lucid Lynx) Server 
> 64-bit using
> m2.xlarge instance types, and over the past 5 days we've had two EC2 servers
> randomly restart on us.  We've checked the logs and there was nothing that
> we saw that indicated why they restarted.  One second they were happily
> logging and the next second the server was in the process of rebooting.
>  This is particularly bad because every time the node comes back up we get
> merge errors due to an existing bug in Riak and have to restore from a
> recent backup.
> Just today we noticed that the EC2 servers did not have swap enabled
> (apparently the norm for xlarge+ instances), which we thought might have
> been our problem?  My knowledge of what happens when swap is off is pretty
> poor - but I have been told that the Linux OOM killer should still be
> invoked and start trying to kill processes, rather than the server simply
> restarting.  Is that correct?  Also, how would Riak hypothetically handle
> swap being off on a system?  We're using Bitcask if that helps.
> Secondly, one of our ops guys here thinks the issue might be related to a
> bug  (?) that others
> Ubuntu users of the same version seem to have.  In fact, we do see the same
> "INFO: task cron:15047 blocked for more than 120 seconds: line in our log
> file.  We're also running a AMI that isn't the official one from Canonical,
> so the thought being an upgrade to the official AMI would help.
> If we do want to upgrade, it will mean moving each cluster node to new
> hardware.  I wanted to ask the list to make sure we were doing it correctly.
>  Here is the plan to transfer a node to new hardware -- note that these
> steps will be done on one node at a time, and we'll make sure the cluster
> has stabilized after doing one node before moving on to the next one.
>1. Stop riak on old server.
>2. Copy data directory (including bitcask, mr_queue and ring folders)
>to a shared location.
>3. Shutdown old server.
>4. Boot new replacement server, installing (but not starting) Riak.
>5. Transfer data directory from shared location to data folder on new
>6. Start riak.
> My main concern is if the ring state will transfer to a new node safely,
> assuming the new server has the same hostname and node name as the old
> server?  The new server will have a different IP address, but all our node
> names in our cluster use hostnames, and those will not be changing.
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
riak-users mailing list

Re: Random server restarts, swap and moving nodes to new hardware

2011-08-16 Thread Jeff Pollard
Hey Sean,

Thanks very much for the reply.  I'll certainly try going to 10.10 with the
upgrade, that's good info.

Re EBS: were you saying to attach the EBS volume to the new node and use the
EBS volume as the data volume?  We had been using EBS as a backup volume,
but using ephemeral storage for the actual data directory.  We did this
primarily for I/O performance reasons and also cause EBS seems to have had a
bad operations track record at AWS.  In my steps from my previous email, our
"shared location" actually was EBS, but we were planning on offloading the
files to the ephemeral disk and using that as the data volume for Riak.
 Does that make sense?

On Tue, Aug 16, 2011 at 7:18 AM, Sean Cribbs  wrote:

> Jeff,
> We highly recommend you upgrade to 10.10 or later. 10.04 has some known
> problems when running under Xen (especially on EC2) -- in some cases under
> load, the network interface will break, making the node temporarily
> inaccessible.
> When you do upgrade, the simplest way (if possible) would be to remount the
> attached EBS volumes where your Riak data is stored onto the new nodes.
> Otherwise, the steps you list are correct.
> Regarding swap, whether you have it on or not is a personal decision. Riak
> will "do the right thing" and exit when it can't allocate more memory,
> allowing you to figure out what went wrong -- as opposed to grinding the
> machine into IO oblivion while consuming more and more swap.  That said, in
> some deployments (notably not on EC2), swap can be helpful.
> Hope that helps,
> --
> Sean Cribbs 
> Developer Advocate
> Basho Technologies, Inc.
> http://www.basho.com/
> On Tue, Aug 16, 2011 at 3:46 AM, Jeff Pollard wrote:
>> Hello everyone,
>> We've got a very interesting problem.  We're hosting our 5-node cluster on
>> EC2 running Ubuntu 10.04 LTS (Lucid Lynx) Server 
>> 64-bit using
>> m2.xlarge instance types, and over the past 5 days we've had two EC2 servers
>> randomly restart on us.  We've checked the logs and there was nothing that
>> we saw that indicated why they restarted.  One second they were happily
>> logging and the next second the server was in the process of rebooting.
>>  This is particularly bad because every time the node comes back up we get
>> merge errors due to an existing bug in Riak and have to restore from a
>> recent backup.
>> Just today we noticed that the EC2 servers did not have swap enabled
>> (apparently the norm for xlarge+ instances), which we thought might have
>> been our problem?  My knowledge of what happens when swap is off is pretty
>> poor - but I have been told that the Linux OOM killer should still be
>> invoked and start trying to kill processes, rather than the server simply
>> restarting.  Is that correct?  Also, how would Riak hypothetically handle
>> swap being off on a system?  We're using Bitcask if that helps.
>> Secondly, one of our ops guys here thinks the issue might be related to a
>> bug  (?) that others
>> Ubuntu users of the same version seem to have.  In fact, we do see the same
>> "INFO: task cron:15047 blocked for more than 120 seconds: line in our log
>> file.  We're also running a AMI that isn't the official one from Canonical,
>> so the thought being an upgrade to the official AMI would help.
>> If we do want to upgrade, it will mean moving each cluster node to new
>> hardware.  I wanted to ask the list to make sure we were doing it correctly.
>>  Here is the plan to transfer a node to new hardware -- note that these
>> steps will be done on one node at a time, and we'll make sure the cluster
>> has stabilized after doing one node before moving on to the next one.
>>1. Stop riak on old server.
>>2. Copy data directory (including bitcask, mr_queue and ring folders)
>>to a shared location.
>>3. Shutdown old server.
>>4. Boot new replacement server, installing (but not starting) Riak.
>>5. Transfer data directory from shared location to data folder on new
>>6. Start riak.
>> My main concern is if the ring state will transfer to a new node safely,
>> assuming the new server has the same hostname and node name as the old
>> server?  The new server will have a different IP address, but all our node
>> names in our cluster use hostnames, and those will not be changing.
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
riak-users mailing list

Re: Random server restarts, swap and moving nodes to new hardware

2011-08-16 Thread Sean Cribbs

That should be fine. I assumed you were using EBS as your data volume
because most people we talk to do (for better or worse).

On Tue, Aug 16, 2011 at 10:32 AM, Jeff Pollard wrote:

> Hey Sean,
> Thanks very much for the reply.  I'll certainly try going to 10.10 with the
> upgrade, that's good info.
> Re EBS: were you saying to attach the EBS volume to the new node and use
> the EBS volume as the data volume?  We had been using EBS as a backup
> volume, but using ephemeral storage for the actual data directory.  We did
> this primarily for I/O performance reasons and also cause EBS seems to have
> had a bad operations track record at AWS.  In my steps from my previous
> email, our "shared location" actually was EBS, but we were planning on
> offloading the files to the ephemeral disk and using that as the data volume
> for Riak.  Does that make sense?
> On Tue, Aug 16, 2011 at 7:18 AM, Sean Cribbs  wrote:
>> Jeff,
>> We highly recommend you upgrade to 10.10 or later. 10.04 has some known
>> problems when running under Xen (especially on EC2) -- in some cases under
>> load, the network interface will break, making the node temporarily
>> inaccessible.
>> When you do upgrade, the simplest way (if possible) would be to remount
>> the attached EBS volumes where your Riak data is stored onto the new nodes.
>> Otherwise, the steps you list are correct.
>> Regarding swap, whether you have it on or not is a personal decision. Riak
>> will "do the right thing" and exit when it can't allocate more memory,
>> allowing you to figure out what went wrong -- as opposed to grinding the
>> machine into IO oblivion while consuming more and more swap.  That said, in
>> some deployments (notably not on EC2), swap can be helpful.
>> Hope that helps,
>> --
>> Sean Cribbs 
>> Developer Advocate
>> Basho Technologies, Inc.
>> http://www.basho.com/
>> On Tue, Aug 16, 2011 at 3:46 AM, Jeff Pollard wrote:
>>> Hello everyone,
>>> We've got a very interesting problem.  We're hosting our 5-node cluster
>>> on EC2 running Ubuntu 10.04 LTS (Lucid Lynx) Server 
>>> 64-bit using
>>> m2.xlarge instance types, and over the past 5 days we've had two EC2 servers
>>> randomly restart on us.  We've checked the logs and there was nothing that
>>> we saw that indicated why they restarted.  One second they were happily
>>> logging and the next second the server was in the process of rebooting.
>>>  This is particularly bad because every time the node comes back up we get
>>> merge errors due to an existing bug in Riak and have to restore from a
>>> recent backup.
>>> Just today we noticed that the EC2 servers did not have swap enabled
>>> (apparently the norm for xlarge+ instances), which we thought might have
>>> been our problem?  My knowledge of what happens when swap is off is pretty
>>> poor - but I have been told that the Linux OOM killer should still be
>>> invoked and start trying to kill processes, rather than the server simply
>>> restarting.  Is that correct?  Also, how would Riak hypothetically handle
>>> swap being off on a system?  We're using Bitcask if that helps.
>>> Secondly, one of our ops guys here thinks the issue might be related to a
>>> bug  (?) that others
>>> Ubuntu users of the same version seem to have.  In fact, we do see the same
>>> "INFO: task cron:15047 blocked for more than 120 seconds: line in our log
>>> file.  We're also running a AMI that isn't the official one from Canonical,
>>> so the thought being an upgrade to the official AMI would help.
>>> If we do want to upgrade, it will mean moving each cluster node to new
>>> hardware.  I wanted to ask the list to make sure we were doing it correctly.
>>>  Here is the plan to transfer a node to new hardware -- note that these
>>> steps will be done on one node at a time, and we'll make sure the cluster
>>> has stabilized after doing one node before moving on to the next one.
>>>1. Stop riak on old server.
>>>2. Copy data directory (including bitcask, mr_queue and ring folders)
>>>to a shared location.
>>>3. Shutdown old server.
>>>4. Boot new replacement server, installing (but not starting) Riak.
>>>5. Transfer data directory from shared location to data folder on new
>>>6. Start riak.
>>> My main concern is if the ring state will transfer to a new node safely,
>>> assuming the new server has the same hostname and node name as the old
>>> server?  The new server will have a different IP address, but all our node
>>> names in our cluster use hostnames, and those will not be changing.
>>> ___
>>> riak-users mailing list
>>> riak-users@lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Sean Cribbs 
Developer Advocate
Basho Technologies, Inc.

Riak memory usage higher than expected

2011-08-16 Thread Jacques
We're utilizing Riak 14.2 and we're seeing higher memory consumption than we

We're running on a 4 node cluster with each node housing 32gb of memory and
are utilizing bitcask with a 3x write replication factor.  We're seeing
faster growth than we expect and also seeing weird bounces update.

You can see an example chart .
 (note, there are sometimes where we've had to stop the job for short
periods of time-- you can see these as flat spots).

We are doing a large throttled import that has key sizes of approximately 12
bytes.  We're currently around 450mm unique items and riak memory
consumption is ~110gb.  The input job is probably 95% new puts and 5%
overwriting puts.

According to the capacity planner tools, our key space should probably be
about half what are actual memory consumption is.

As you can see in the chart, we're also seeing jumps in memory size at
random intervals.  What might these be?  Nothing interesting in the logs
that I can see.  Regular merges.

A close up  of a recent jump in
memory consumption for one of the nodes (they all look the same).  There are
no corresponding distinct patterns within the cpu chart.  Things are pretty
flat although we have more wait sometimes than we like (need more spindles

Any helpful thoughts?

riak-users mailing list