from:"Nitish Sharma"

Riak nodes (potentially) have empty partitions

2012-06-05 Thread Nitish Sharma

Hi,

We have Riak cluster running 1.1.2. All the nodes own 6.6% of partitions,
still memory usage of "beam" process on some nodes is less than the others
(significantly). The number of keys on the nodes consuming less memory is
also fewer.

Then how come they still own the equal number of partitions as the other
nodes? Would it be because some partitions are empty? Any suggestions.


Cheers

Nitish
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Riak nodes (potentially) have empty partitions

2012-06-07 Thread Nitish Sharma

Hi Mark,
Riak admin reports 6 (out of 15 nodes) with 25-30% free memory, 3 nodes with 
8-10% and 6 nodes with 18-20%. 
This unequal distribution forces us to add new nodes relatively fast.

Cheers
Nitish
On Jun 7, 2012, at 12:20 AM, Mark Phillips wrote:

> Hi Nitish, 
> 
> On Tue, Jun 5, 2012 at 5:08 AM, Nitish Sharma  
> wrote:
> Hi,
> 
> We have Riak cluster running 1.1.2. All the nodes own 6.6% of partitions, 
> still memory usage of "beam" process on some nodes is less than the others 
> (significantly). The number of keys on the nodes consuming less memory is 
> also fewer. 
> 
> 
> Can you add some more color to "significantly"? That would help us diagnose 
> things. 
> 
> Mark 
> 
> 

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Riak nodes (potentially) have empty partitions

2012-06-07 Thread Nitish Sharma

Hi Jared,
We have a 15 nodes Riak (bitcask) cluster containing almost 2.8 billion keys.

Cheers
Nitish
On Jun 7, 2012, at 1:19 AM, Jared Morrow wrote:

> Also, how many nodes do you have and approximately how many keys are in Riak?
> 
> -Jared
> 
> 
> On Jun 6, 2012, at 4:20 PM, Mark Phillips wrote:
> 
>> Hi Nitish, 
>> 
>> On Tue, Jun 5, 2012 at 5:08 AM, Nitish Sharma  
>> wrote:
>> Hi,
>> 
>> We have Riak cluster running 1.1.2. All the nodes own 6.6% of partitions, 
>> still memory usage of "beam" process on some nodes is less than the others 
>> (significantly). The number of keys on the nodes consuming less memory is 
>> also fewer. 
>> 
>> 
>> Can you add some more color to "significantly"? That would help us diagnose 
>> things. 
>> 
>> Mark 
>> 
>> 
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Riak nodes (potentially) have empty partitions

2012-06-08 Thread Nitish Sharma

Hi Mark,
Riak admin reports 6 (out of 15 nodes) with 25-30% free memory, 3 nodes
with 8-10% and 6 nodes with 18-20%.
This unequal distribution forces us to add new nodes relatively fast.

Cheers
Nitish

On Thu, Jun 7, 2012 at 12:20 AM, Mark Phillips  wrote:

> Hi Nitish,
>
> On Tue, Jun 5, 2012 at 5:08 AM, Nitish Sharma 
> wrote:
>
>> Hi,
>>
>> We have Riak cluster running 1.1.2. All the nodes own 6.6% of partitions,
>> still memory usage of "beam" process on some nodes is less than the others
>> (significantly). The number of keys on the nodes consuming less memory is
>> also fewer.
>>
>
> Can you add some more color to "significantly"? That would help us
> diagnose things.
>
> Mark
>
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Riak nodes (potentially) have empty partitions

2012-06-08 Thread Nitish Sharma

Hi Jared,
We have a 15 nodes Riak (bitcask) cluster containing almost 2.8 billion
keys.

Cheers
Nitish

On Thu, Jun 7, 2012 at 1:19 AM, Jared Morrow  wrote:

> Also, how many nodes do you have and approximately how many keys are in
> Riak?
>
> -Jared
>
>
> On Jun 6, 2012, at 4:20 PM, Mark Phillips wrote:
>
> Hi Nitish,
>
> On Tue, Jun 5, 2012 at 5:08 AM, Nitish Sharma 
> wrote:
>
>> Hi,
>>
>> We have Riak cluster running 1.1.2. All the nodes own 6.6% of partitions,
>> still memory usage of "beam" process on some nodes is less than the others
>> (significantly). The number of keys on the nodes consuming less memory is
>> also fewer.
>>
>
> Can you add some more color to "significantly"? That would help us
> diagnose things.
>
> Mark
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Nodes neither leaving nor joining the cluster

2011-11-09 Thread Nitish Sharma

Hi,
I've installed riak using pre-compiled Debian binary. I setup a Riak
cluster consisting of 3 nodes (name of each node in format: riak@).
The configuration file for all the nodes is same.
Just to clarify my problem, I'll be using node names as A, B, and C.
Initially, all three nodes were successfully able to form a cluster ( B and
C joined node A). To perform some tests, node A rebooted and after that its
``status" is not showing all three nodes as *ring_members.* While, running
``riak-admin status" on nodes B and C was still stating *ring_members *as
A, B, C.
Running ``riak-admin leave" on nodes B and C didnt help, since running *join
 *command again was throwing "Failed: This node is already a member of a
cluster". Force removal of nodes also didnt do any good; still the same
inconsistent ring members status.

Any suggestions ?

Regards
Nitish
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Nodes neither leaving nor joining the cluster

2011-11-09 Thread Nitish Sharma

Hi Alexandre,
Fortunately its a test cluster. Removing the ring files, stopping and
starting the nodes, and then again running *join *brought the cluster in
desirable state.
Name and IP of all the nodes was same all the time, still restarting a
single node brought the whole cluster in inconsistent state. Weird!

Thanks for the tip, though.

Regards
Nitish
On Wed, Nov 9, 2011 at 5:54 PM, Alexandre Ravey
wrote:

> Hi Nitish,
>
> Did you change the node name or IP at any point?
> If so, you need to reip your node(s).
>
> Run riak-admin status and check the node name in ring_members, must be
> the same as your actual node. (ie, same ip and node name).
> Check also that you are not using xxx@127.0.0.1 if your nodes aren't
> on the same physical box.
>
> If it's a test cluster you may just rm /var/lib/riak/ring/* (node
> down) and join again.
>
> Regards
>
> Alexandre
>
> On Wed, Nov 9, 2011 at 5:38 PM, Nitish Sharma
>  wrote:
> > Hi,
> > I've installed riak using pre-compiled Debian binary. I setup a Riak
> cluster
> > consisting of 3 nodes (name of each node in format: riak@). The
> > configuration file for all the nodes is same.
> > Just to clarify my problem, I'll be using node names as A, B, and C.
> > Initially, all three nodes were successfully able to form a cluster ( B
> and
> > C joined node A). To perform some tests, node A rebooted and after that
> its
> > ``status" is not showing all three nodes as ring_members. While, running
> > ``riak-admin status" on nodes B and C was still stating ring_members as
> A,
> > B, C.
> > Running ``riak-admin leave" on nodes B and C didnt help, since
> > running join command again was throwing "Failed: This node is already a
> > member of a cluster". Force removal of nodes also didnt do any good;
> still
> > the same inconsistent ring members status.
> > Any suggestions ?
> > Regards
> > Nitish
> >
> > ___
> > riak-users mailing list
> > riak-users@lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >
> >
>
t
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Problem installing Riak Python client

2011-11-10 Thread Nitish Sharma

Hi,
I am trying to install Riak's python client library using Pip. But it
throws an IOError while installing: IOError: [Errno 2] No such file or
directory: 'protobuf/setup.py'. Apparently, a lot of guys are facing the
same problem. The problem is that the latest version of Protobuf (2.4.1)
has moved setup.py from /setup.py to /python/setup.py.
There is already an issue raised for Riak:
https://github.com/basho/riak-python-client/issues/19.
Any suggestions how to specify correct path of setup.py to Pip?

Cheers
Nitish
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Importing data to Riak

2011-11-14 Thread Nitish Sharma

Hi,
This is more sort of a discussion than a question. I am just trying to see
the trend in how users import their data to Riak.
For the data I am using, I was able to achieve almost 150 records/second
with PHP library, and 400 records/second with node.js (fairly new with
node; was hitting memory wall when trying to import 1 million records).
What are some hacks/tricks/tweaks to import large amount of data to Riak?

Cheers
Nitish
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Importing data to Riak

2011-11-15 Thread Nitish Sharma

Hi,
I tried importing the data using Python library (with protocol buffers).
After storing several objects, I get thread exception with timeout errors.
Following is the traceback:

  File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner
self.run()
  File "/usr/lib/python2.7/threading.py", line 505, in run
self.__target(*self.__args, **self.__kwargs)
  File "python_load_data.py", line 23, in worker
new_obj.store()
  File
"/usr/local/lib/python2.7/dist-packages/riak-1.3.0-py2.7.egg/riak/riak_object.py",
line 296, in store
Result = t.put(self, w, dw, return_body)
  File
"/usr/local/lib/python2.7/dist-packages/riak-1.3.0-py2.7.egg/riak/transports/pbc.py",
line 188, in put
msg_code, resp = self.recv_msg()
  File
"/usr/local/lib/python2.7/dist-packages/riak-1.3.0-py2.7.egg/riak/transports/pbc.py",
line 370, in recv_msg
raise Exception(msg.errmsg)
Exception: timeout

The cluster consists of 3 nodes (Ubuntu 10.04).
Any Suggestions?

Cheers
Nitish

On Mon, Nov 14, 2011 at 2:20 PM, Andres Jaan Tack  wrote:

> I was able to achieve similar results. I wrote a Ruby process that would
> keep at most n (I think n = 10) things at once and reached 2,500ish req/s
> on my macbook pro.
>
> I loaded data to a cluster of six Riak nodes by running several of these
> processes at once and attaching each to a different Riak node, and I hit
> 18,000 req/s. I'm not sure whether loading different nodes affected the
> speed or not, now that I think of it.
>
>
> 2011/11/14 Russell Brown 
>
>>
>> On 14 Nov 2011, at 11:47, Nitish Sharma wrote:
>>
>> > Hi,
>> > This is more sort of a discussion than a question. I am just trying to
>> see the trend in how users import their data to Riak.
>> > For the data I am using, I was able to achieve almost 150
>> records/second with PHP library, and 400 records/second with node.js
>> (fairly new with node; was hitting memory wall when trying to import 1
>> million records).
>> > What are some hacks/tricks/tweaks to import large amount of data to
>> Riak?
>>
>> New keys, new data, straight in for the first time, no fetch before
>> store? I've had reasonable results creating a *number* of threads and using
>> the Java Raw PB client to write.
>>
>> For example, maybe have a 1 or a couple of threads that reads data (from
>> Oracle, a file, what-have-you) and puts it on a queue, and have a bunch of
>> threads that pull data off the queue, create a riak object and store it.
>> From my laptop I've got up to 2500 writes a second like this, and it was
>> just ad hoc, throw away code with 4 threads against a small 3 node cluster
>> (running on desktops.)
>>
>> I imagine others on the list have more direct, real world examples?
>>
>> Cheers
>>
>> Russell
>>
>> >
>> > Cheers
>> > Nitish
>> > ___
>> > riak-users mailing list
>> > riak-users@lists.basho.com
>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Error while importing data

2011-11-19 Thread Nitish Sharma

Hi,
To give my Riak setup a good stress testing, I decided to import a large
dataset (consisting of around 160 million records). But before importing
the whole thing, I tested the import python script (using protocol buffers)
using 1 million records, which was successful with ~2200 writes/sec. The
script, essentially, puts the data into a queue and couple of threads gets
the data from the queue and store it in Riak.
When started with full dataset, after storing several million objects, I
get thread exception with timeout errors.
Following is the traceback:

  File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner
self.run()
  File "/usr/lib/python2.7/threading.py", line 505, in run
self.__target(*self.__args, **self.__kwargs)
  File "python_load_data.py", line 23, in worker
new_obj.store()
  File 
"/usr/local/lib/python2.7/dist-packages/riak-1.3.0-py2.7.egg/riak/riak_object.py",
line 296, in store
Result = t.put(self, w, dw, return_body)
  File 
"/usr/local/lib/python2.7/dist-packages/riak-1.3.0-py2.7.egg/riak/transports/pbc.py",
line 188, in put
msg_code, resp = self.recv_msg()
  File 
"/usr/local/lib/python2.7/dist-packages/riak-1.3.0-py2.7.egg/riak/transports/pbc.py",
line 370, in recv_msg
raise Exception(msg.errmsg)
Exception: timeout

The cluster consists of 3 nodes (Ubuntu 10.04). The nodes have enough disk
space; number of file handles used (~2500) are also within limit (32768);
number of concurrent ports 32768. I cant figure out what else could be the
possible reason for the exceptions.

Any Suggestions?

Cheers
Nitish
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Error while importing data

2011-11-23 Thread Nitish Sharma

So far I've figured out that this error has nothing to do with Python.
After couple of million iterations, one of the nodes (any random node) in
the cluster crashes and thus python threads time out.
I am trying to make sense out of error and crash logs.

Cheers
Nitish

On Sat, Nov 19, 2011 at 10:16 PM, Erik Søe Sørensen  wrote:

> A timeout... Do you know what the timeout threshold is? Have you tried
> increasing it (if possible; I don't know the Python client) or simply
> retrying once or twice on timeout?
> Also, what backend is Riak configured with? - I believe eleveldb has
> occasional lower throughput/higher latency because of file compaction.
>
> ----- Reply message -
> Fra: "Nitish Sharma" 
> Dato: lør., nov. 19, 2011 13:22
> Emne: Error while importing data
> Til: "riak-users" 
>
> Hi,
> To give my Riak setup a good stress testing, I decided to import a large
> dataset (consisting of around 160 million records). But before importing
> the whole thing, I tested the import python script (using protocol buffers)
> using 1 million records, which was successful with ~2200 writes/sec. The
> script, essentially, puts the data into a queue and couple of threads gets
> the data from the queue and store it in Riak.
> When started with full dataset, after storing several million objects, I
> get thread exception with timeout errors.
> Following is the traceback:
>
>  File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner
>self.run()
>  File "/usr/lib/python2.7/threading.py", line 505, in run
>self.__target(*self.__args, **self.__kwargs)
>  File "python_load_data.py", line 23, in worker
>new_obj.store()
>  File
> "/usr/local/lib/python2.7/dist-packages/riak-1.3.0-py2.7.egg/riak/riak_object.py",
> line 296, in store
>Result = t.put(self, w, dw, return_body)
>  File
> "/usr/local/lib/python2.7/dist-packages/riak-1.3.0-py2.7.egg/riak/transports/pbc.py",
> line 188, in put
>msg_code, resp = self.recv_msg()
>  File
> "/usr/local/lib/python2.7/dist-packages/riak-1.3.0-py2.7.egg/riak/transports/pbc.py",
> line 370, in recv_msg
>raise Exception(msg.errmsg)
> Exception: timeout
>
> The cluster consists of 3 nodes (Ubuntu 10.04). The nodes have enough disk
> space; number of file handles used (~2500) are also within limit (32768);
> number of concurrent ports 32768. I cant figure out what else could be the
> possible reason for the exceptions.
>
> Any Suggestions?
>
> Cheers
> Nitish
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Riak node crash

2011-11-24 Thread Nitish Sharma

Hi,
I have a Riak cluster setup consisting of 3 nodes. While running a simple
HTTP query to list the buckets or list the keys in a bucket, one of nodes
crashes. I've tried variations of query and the node crashes every time.
Following is the console.log error:

2011-11-24 18:16:29.878 [error] <0.3455.0> CRASH REPORT Process <0.3455.0>
with 0 neighbours crashed with reason: no case clause matching ok in
riak_core_vnode:init/1.
2011-11-24 18:16:29.878 [error]
<0.3441.0>@riak_core_handoff_receiver:handle_info:82 Handoff receiver for
partition undefined exited abnormally after processing 0 objects:
{{{badmatch,{error,{{case_clause,ok},[{riak_core_vnode,init,1},{gen_fsm,init_it,6},{proc_lib,init_p_do_apply,3}]}}},[{riak_core_vnode_master,get_vnode,2},{riak_core_vnode_master,handle_call,3},{gen_server,handle_msg,5},{proc_lib,init_p_do_apply,3}]},{gen_server,call,[riak_kv_vnode_master,{756441277134158736960891563808232422282855710720,get_vnode},infinity]}}
2011-11-24 18:16:29.885 [info] <0.557.0>@riak_kv_js_vm:terminate:228
Spidermonkey VM (pool: riak_kv_js_hook) host stopping (<0.557.0>)
2011-11-24 18:16:31.906 [error] <0.96.0> Supervisor riak_core_vnode_sup had
child undefined started with {riak_core_vnode,start_link,undefined} at
<0.3455.0> exit with reason no case clause matching ok in
riak_core_vnode:init/1 in context child_terminated
2011-11-24 18:16:34.995 [error] <0.3426.0> gen_server riak_kv_vnode_master
terminated with reason: no match of right hand value
{error,{{case_clause,ok},[{riak_core_vnode,init,1},{gen_fsm,init_it,6},{proc_lib,init_p_do_apply,3}]}}
in riak_core_vnode_master:get_vnode/22011-11-24 18:16:44.310 [error]
<0.3426.0> CRASH REPORT Process riak_kv_vnode_master with 0 neighbours
crashed with reason: no match of right hand value
{error,{{case_clause,ok},[{riak_core_vnode,init,1},{gen_fsm,init_it,6},{proc_lib,init_p_do_apply,3}]}}
in riak_core_vnode_master:get_vnode/2
2011-11-24 18:16:47.202 [error] <0.178.0> Supervisor riak_kv_sup had child
riak_kv_vnode_master started with
riak_core_vnode_master:start_link(riak_kv_vnode, riak_kv_legacy_vnode) at
<0.3426.0> exit with reason no match of right hand value
{error,{{case_clause,ok},[{riak_core_vnode,init,1},{gen_fsm,init_it,6},{proc_lib,init_p_do_apply,3}]}}
in riak_core_vnode_master:get_vnode/2 in context child_terminated
2011-11-24 18:16:49.498 [error] <0.2762.0> gen_fsm <0.2762.0> in state
active terminated with reason: {timeout,{gen_server,call,[<0.2763.0>,stop]}}
2011-11-24 18:16:49.719 [error] <0.2762.0> CRASH REPORT Process <0.2762.0>
with 0 neighbours crashed with reason:
{timeout,{gen_server,call,[<0.2763.0>,stop]}}

Any Suggestions?

Cheers
Nitish
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reip(ing) riak node created two copies in the cluster

2012-05-02 Thread Nitish Sharma

Hi,
We have a 12-node Riak cluster. Until now we were naming every new node as 
riak@. We then decided to rename the all the nodes to 
riak@, which makes troubleshooting easier. 
After issuing reip command to two nodes, we noticed in the "status" that those 
2 nodes were now appearing in the cluster with the old name as well as the new 
name. Other nodes were trying to handoff partitions to the "new" nodes, but 
apparently they were not able to. After this the whole cluster went down and 
completely stopped responding to any read/write requests. 
member_status displayed old Riak name in "legacy" mode. Since this is our 
production cluster, we are desperately looking for some quick remedies. Issuing 
"force-remove" to the old names, restarting all the nodes, changing the riak 
names back to the old ones -  none of it helped.
Currently, we are hosting limited amount of data. Whats an elegant way to 
recover from this mess? Would shutting off all the nodes, deleting the ring 
directory, and again forming the cluster work?

Cheers
Nitish  
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Reip(ing) riak node created two copies in the cluster

2012-05-02 Thread Nitish Sharma

Hi Jon,
Thanks for your input. I've already started working on that lines. 
I stopped all the nodes, moved ring directory from one node, brought that one 
up, and issued join command to one other node (after moving the ring directory) 
- node2. While they were busy re-distributing the partitions, I started another 
node (node3) and issued join command (before risk_kv was running, since it 
takes some time to load existing data).
But after this, data handoffs are occurring only between node1 and node2. 
"member_status" says that node 3 owns 0% of the ring and 0% are pending.
We have a lot of data - each node serves around 200 million documents. Riak 
cluster is running 1.1.2.
Any suggestions?

Cheers
Nitish
On May 2, 2012, at 5:31 PM, Jon Meredith wrote:

> Hi Nitish,
> 
> If you rebuild the cluster with the same ring size, the data will eventually 
> get back to the right place.  While the rebuild is taking place you may have 
> notfounds for gets until the data has been handed off to the newly assigned 
> owner (as it will be secondary handoff, not primary ownership handoff to get 
> teh data back).  If you don't have a lot of data stored in the cluster it 
> shouldn't take too long.
> 
> The process would be to stop all nodes, move the files out of the ring 
> directory to a safe place, start all nodes and rejoin.  If you're using 1.1.x 
> and you have capacity in your hardware you may want to increase 
> handoff_concurrency to something like 4 to permit more transfers to happen 
> across the cluster.
> 
> 
> Jon.
> 
> 
> 
> On Wed, May 2, 2012 at 9:05 AM, Nitish Sharma  
> wrote:
> Hi,
> We have a 12-node Riak cluster. Until now we were naming every new node as 
> riak@. We then decided to rename the all the nodes to 
> riak@, which makes troubleshooting easier.
> After issuing reip command to two nodes, we noticed in the "status" that 
> those 2 nodes were now appearing in the cluster with the old name as well as 
> the new name. Other nodes were trying to handoff partitions to the "new" 
> nodes, but apparently they were not able to. After this the whole cluster 
> went down and completely stopped responding to any read/write requests.
> member_status displayed old Riak name in "legacy" mode. Since this is our 
> production cluster, we are desperately looking for some quick remedies. 
> Issuing "force-remove" to the old names, restarting all the nodes, changing 
> the riak names back to the old ones -  none of it helped.
> Currently, we are hosting limited amount of data. Whats an elegant way to 
> recover from this mess? Would shutting off all the nodes, deleting the ring 
> directory, and again forming the cluster work?
> 
> Cheers
> Nitish
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> 
> 
> -- 
> Jon Meredith
> Platform Engineering Manager
> Basho Technologies, Inc.
> jmered...@basho.com
> 

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Reip(ing) riak node created two copies in the cluster

2012-05-02 Thread Nitish Sharma


On May 2, 2012, at 6:12 PM, Jon Meredith wrote:

> Hi Nitish, for this to work you'll have to stop all the nodes at the same 
> time, clear the ring on all nodes, start up all nodes, then rejoin
> 
> If you clear the rings one node at a time, when you rejoin the nodes the ring 
> with the old and new style names will be gossipped back to it and you'll 
> still have both names.
Sorry for the confusion. I didn't clear the rings one node at a time while 
keeping other nodes live. Following are the steps I followed:
1. Stop Riak on all the nodes.
2. Remove ring directory from all nodes.
3. Start the nodes and rejoin.

> I didn't realize you had a large amount of data - originally you said 
> "Currently, we are hosting limited amount of data", but 200mil docs per node 
> seems like a fair amount.  Rebuilding that size cluster may take a long time.
> 
Yeah, we are currently serving very limited amount because of Riak shortage. In 
total, we have almost 750 million documents served by Riak.
> Your options as I see them are
>   1) If you have backups of the ring files, you could revert the node name 
> changes and get the cluster stable again on riak@IP.  The ring files have a 
> timestamp associated with them, but we only keep a few of the last ring 
> files, so if enough gossip has happened then the pre-rename rings will have 
> been destroyed.  You will have to stop all nodes, put the ring files back as 
> they were before the change and fix the names in vm.args and then restart the 
> nodes.
> 
>   2) you can continue on the rebuild plan.  stop all nodes, set the new names 
> in vm.args, start the nodes again and rebuild the cluster, adding as many 
> nodes as you can at once so they rebalance at the same time.  When new nodes 
> are added the claimant node works out ownership changes and will start a 
> sequence of transfers.  If new nodes are added once a sequence is under way 
> the claimant will wait for that to complete, then check if there are any new 
> nodes and repeat until all nodes are assigned.  If you add all the nodes at 
> once you will do less transfers over all.
> 
> 
> If the cluster cannot be stopped, there are other things we might be able to 
> do, but they're a bit more complex.  What are your uptime requirements?
> 
We have currently stopped the cluster and running on small amount of data. We 
can wait for the partition re-distribution to complete on Riak, but I don't 
have a strong feeling about it. "member_status" doesn't give us a correct 
picture: http://pastie.org/3849548. Is this expected behavior? I should also 
mention that all the nodes are still loading existing data and it will take few 
hours (2-3) until Riak KV is running on all of them.

Cheers
Nitish
> Jon
> 
> 
> 
> On Wed, May 2, 2012 at 9:57 AM, Nitish Sharma  
> wrote:
> Hi Jon,
> Thanks for your input. I've already started working on that lines. 
> I stopped all the nodes, moved ring directory from one node, brought that one 
> up, and issued join command to one other node (after moving the ring 
> directory) - node2. While they were busy re-distributing the partitions, I 
> started another node (node3) and issued join command (before risk_kv was 
> running, since it takes some time to load existing data).
> But after this, data handoffs are occurring only between node1 and node2. 
> "member_status" says that node 3 owns 0% of the ring and 0% are pending.
> We have a lot of data - each node serves around 200 million documents. Riak 
> cluster is running 1.1.2.
> Any suggestions?
> 
> Cheers
> Nitish
> On May 2, 2012, at 5:31 PM, Jon Meredith wrote:
> 
>> Hi Nitish,
>> 
>> If you rebuild the cluster with the same ring size, the data will eventually 
>> get back to the right place.  While the rebuild is taking place you may have 
>> notfounds for gets until the data has been handed off to the newly assigned 
>> owner (as it will be secondary handoff, not primary ownership handoff to get 
>> teh data back).  If you don't have a lot of data stored in the cluster it 
>> shouldn't take too long.
>> 
>> The process would be to stop all nodes, move the files out of the ring 
>> directory to a safe place, start all nodes and rejoin.  If you're using 
>> 1.1.x and you have capacity in your hardware you may want to increase 
>> handoff_concurrency to something like 4 to permit more transfers to happen 
>> across the cluster.
>> 
>> 
>> Jon.
>> 
>> 
>> 
>> On Wed, May 2, 2012 at 9:05 AM, Nitish Sharma  
>> wrote:
>> Hi,
>> We have a 12-node Riak cluster. Until now we were naming every new node as 
>> riak@. We then decided to rena

Riak nodes (potentially) have empty partitions

Re: Riak nodes (potentially) have empty partitions

Re: Riak nodes (potentially) have empty partitions

Re: Riak nodes (potentially) have empty partitions

Re: Riak nodes (potentially) have empty partitions

Nodes neither leaving nor joining the cluster

Re: Nodes neither leaving nor joining the cluster

Problem installing Riak Python client

Importing data to Riak

Re: Importing data to Riak

Error while importing data

Re: Error while importing data

Riak node crash

Reip(ing) riak node created two copies in the cluster

Re: Reip(ing) riak node created two copies in the cluster

Re: Reip(ing) riak node created two copies in the cluster

16 matches

Site Navigation

Mail list logo

Footer information