standby cluster experiment

John Loehrer Fri, 09 Dec 2011 11:53:44 -0800

I am currently evaluating riak. I'd like to be able to do periodic snapshots of 
/var/lib/riak using LVM without stopping the node. According to a response on 
this ML you should be able to copy the data directory for eleveldb backend.


http://comments.gmane.org/gmane.comp.db.riak.user/5202


If I cycle through each node and do `riak stop` before taking a snapshot 
everything works fine.
But if I don't shut down the node before copying, I run into problems. Since I 
access the http interface of the cluster through an haproxy load-balancer, once 
the node turns off it is taken out of the pool almost immediately. But for a 
millisecond or two before haproxy detects the node is down there might be some 
bad responses. I can live with it and build better retries into my client, but 
would rather avoid it if I can.

More details below.

Thanks for help!

~ John Loehrer
Gaia Interactive INC 


DETAILS
------------------
I am playing with the idea of being able to bring up an standby cluster on an 
alternate port on the same server pointing at an hourly snapshot of my 
choosing, so that I can go back in time and review the data for recovery and 
repair purposes.

Here's what I have so far. 

I have a small cluster of 4 nodes on centos 5.4 using the eleveldb backend so I 
can take advantage of 2i (very cool feature, btw). 

Steps for installation:


----
# install the riak rpm ...
yum install riak-1.0.2-1.el5.x86_64.rpm 

# get the ip address out of ifconfig
IPADDR=`ifconfig eth0 | grep "inet addr" | awk '{ print $2 }' | awk 'BEGIN { 
FS=":" } { print $2 }'`

# replace the loopback ip address in app.config and vm.args with the machine's 
ip
perl -pi -e s/127.0.0.1/$IPADDR/g /etc/riak/*

# change the storage backend to eleveldb
perl -pi -e 's/riak_kv_bitcask_backend/riak_kv_eleveldb_backend/g' 
/etc/riak/app.config
----

We also mount an lvm partition at /var/lib/riak so we can snapshot the data 
directory and back it up using rsnapshot once per hour. It uses hardlinks on 
all the files from the initial snapshot of the data making for very efficient 
storage. The append-only storage approach of the leveldb and bitcask backends 
mean that once a file is closed it is immutable. rsnapshot only has to rsync 
over files that have changed since the previous snapshot. Hourly snapshots of 
take up only a little bit more storage space as the original even if i populate 
the cluster with hundreds of millions of keys over the course of a 24 hour 
period. The backup operation takes only a few seconds even for 50GB of data. 
Now i can copy the data in the hourly snapshot directory to my standby riak 
node, reip, and start up a standby cluster on the same machines. pointing to an 
hourly snapshot and starting up the node takes only a second or two as well. 

Steps for creating the standby node on the same machine:

----

# make the root directory of the standby node in the snapshots directory
# so that we can hard-link to the hourly snapshots dir for a which restore.
mkdir /.snapshots/riak-standby

# create a handy symlink for the standby node root dir ... 
# we'll use /riak-standby from now on.
ln -s /.snapshots/riak-standby /riak-standby

# create the default directory structure
mkdir -p /riak-standby/bin/
mkdir -p /riak-standby/etc/
mkdir -p /riak-standby/data

# we are going to use the same libraries, so symlink that in place.
ln -s /usr/lib64/riak/* /riak-standby/

# copy the app.config and vm.args files from the live node
cp ~/etc/riak/app.config /riak-standby/etc/app.config
cp ~/etc/riak/vm.args /riak-standby/etc/vm.args

# now, we need to make the app.config file work for the standby node.
# change /var/lib/riak to ./data
perl -pi -e 's/\/var\/lib\/riak/.\/data/g' /riak-standby/etc/app.config

# change /usr/sbin to ./bin
perl -pi -e 's/\/usr\/sbin/.\/bin/g' /riak-standby/etc/app.config

# change /usr/lib64/riak to ./lib
perl -pi -e 's/\/usr\/lib64\/riak/.\/lib/g' /riak-standby/etc/app.config

# change /var/log/riak to ./log
perl -pi -e 's/\/var\/log\/riak/.\/log/g' /riak-standby/etc/app.config

# change all the ports from 80** to 81**
perl -pi -e 's/80/81/g' /riak-standby/etc/app.config

# change the cookie and node names in vm.args
perl -pi -e 's/riak@/stby@/g' /riak-standby/etc/app.config
perl -pi -e 's/setcookie riak/setcookie stby/g' /riak-standby/etc/app.config

# fix any permission issues.
chown -R riak:riak /.snapshots/riak-standby

The riak script in /riak-standby/bin/riak is almost the same as the default one 
installed in /usr/sbin/riak:

diff /usr/sbin/riak /riak-standby/bin/riak 
3a4
> ## MANAGED BY PUPPET.
5c6
< RUNNER_SCRIPT_DIR=/usr/sbin
---
> RUNNER_SCRIPT_DIR=$(cd ${0%/*} && pwd)
8,11c9,12
< RUNNER_BASE_DIR=/usr/lib64/riak
< RUNNER_ETC_DIR=/etc/riak
< RUNNER_LOG_DIR=/var/log/riak
< PIPE_DIR=/var/run/riak/
---
> RUNNER_BASE_DIR=${RUNNER_SCRIPT_DIR%/*}
> RUNNER_ETC_DIR=$RUNNER_BASE_DIR/etc
> RUNNER_LOG_DIR=$RUNNER_BASE_DIR/log
> PIPE_DIR=/tmp/$RUNNER_BASE_DIR/
13c14
< PLATFORM_DATA_DIR=/var/lib/riak
---
> PLATFORM_DATA_DIR=./data


Same is true of the riak-admin script for the standby node:

diff /usr/sbin/riak-admin /riak-standby/bin/riak-admin
1a2
> ## MANAGED BY PUPPET.
3c4
< RUNNER_SCRIPT_DIR=/usr/sbin
---
> RUNNER_SCRIPT_DIR=$(cd ${0%/*} && pwd)
6,8c7,9
< RUNNER_BASE_DIR=/usr/lib64/riak
< RUNNER_ETC_DIR=/etc/riak
< RUNNER_LOG_DIR=/var/log/riak
---
> RUNNER_BASE_DIR=${RUNNER_SCRIPT_DIR%/*}
> RUNNER_ETC_DIR=$RUNNER_BASE_DIR/etc
> RUNNER_LOG_DIR=$RUNNER_BASE_DIR/log


After that, i expected to be able to just copy the data from snapshots, reip, 
and start up my standby cluster. 


rm -rf /riak-standby/data && cp -al /.snapshots/hourly.0/riak/ 
/riak-standby/data
/riak-standby/bin/riak-admin reip riak@<ip1> stby@<ip1>
/riak-standby/bin/riak-admin reip riak@<ip2> stby@<ip2>
/riak-standby/bin/riak-admin reip riak@<ip3> stby@<ip3>
/riak-standby/bin/riak-admin reip riak@<ip4> stby@<ip4>
/riak-standby/bin/riak start


But when I did, /riak-standby/bin/riak-admin ring_status showed the claimant as 
riak@<ip1>  not stby@<ip1> as I expected. 

Instead of doing reip, i did a binary safe replacement of riak@  with stby@:

perl -pi -e 's/riak@/stby@/g' /riak-standby/data/ring/riak_core_ring.default.*

When the nodes start up, the claimant looks correct and all the nodes join 
together just fine. 

But I still have the problem where the data directory fills up even though 
nothing is being written actively to the standby cluster. I left it alone for 5 
or 6 hours and it eventually filled up an entire TB of storage. 

I noticed that riak-admin transfers starts off showing waiting to hand off 1 
partition.

 /riak-standby/bin/riak-admin transfers  
'stby@<ip1>' waiting to handoff 1 partitions

This usually clears up after a minute or so. Not sure it if is related.


No clues in the console log. They all look something like:

2011-12-09 19:10:33.371 [info] <0.7.0> Application bitcask started on node 
'stby@192.168.3.94'
2011-12-09 19:10:33.388 [info] <0.7.0> Application riak_kv started on node 
'stby@192.168.3.94'
2011-12-09 19:10:33.388 [info] <0.7.0> Application skerl started on node 
'stby@192.168.3.94'
2011-12-09 19:10:33.391 [info] <0.7.0> Application luwak started on node 
'stby@192.168.3.94'
2011-12-09 19:10:33.402 [info] <0.7.0> Application merge_index started on node 
'stby@192.168.3.94'
2011-12-09 19:10:33.405 [info] <0.7.0> Application riak_search started on node 
'stby@192.168.3.94'
2011-12-09 19:10:33.405 [info] <0.7.0> Application basho_stats started on node 
'stby@192.168.3.94'
2011-12-09 19:10:33.419 [info] <0.7.0> Application runtime_tools started on 
node 'stby@192.168.3.94'
2011-12-09 19:10:33.419 [info] <0.7.0> Application public_key started on node 
'stby@192.168.3.94'
2011-12-09 19:10:33.447 [info] <0.7.0> Application ssl started on node 
'stby@192.168.3.94'



If I turn off the node before taking the snapshot everything works fine.

/etc/init.d/riak stop
 .... do backup here
/etc/init.d/riak start


But the standby data directory starts filling up at the rate of about 500 MB a 
second on some of the nodes if I do a copy without first stopping riak. I know 
this is not a supported approach, but I was curious if someone might be able to 
shed some light on what might be happening.


Ideas?


Thanks for any insight.






_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

standby cluster experiment

Reply via email to