Re: speeding up riaksearch precommit indexing

2011-06-18 Thread Les Mikesell
Is there a good way to handle something like this with redundancy all the way 
through?  On simple key/value items you could have two readers write the same 
things to riak and let bitcask cleanup eventually discard one, but with indexing 
you probably need to use some sort of failover approach up front.  Do any of 
those queue managers handle that without adding their own single point of 
failure?  Assuming there are unique identifiers in the items being written, you 
might use the CAS feature of redis to arbitrate writes into its queue, but what 
happens when the redis node fails?


  -Les


On 6/17/11 11:48 PM, John D. Rowell wrote:

Why not decouple the twitter stream processing from the indexing? More than
likely you have a single process consuming the spritzer stream, so you can put
the fetched results in a queue (hornetq, beanstalk, or even a simple Redis
queue) and then have workers pull from the queue and insert into Riak. You could
run one worker per node and thus insert in parallel into all nodes. If you need
free CPU (e.g. for searches), just throttle the workers to some sane level. If
you see the queue getting bigger, add another Riak node (and thus another local
worker).

-jd

2011/6/13 Steve Webb mailto:sw...@gnip.com>>

Ok, I've changed my two VMs to each have:

3 CPUs, 1GB ram, 120GB disk

I'm ingesting the twitter spritzer stream (about 10-20 tweets per second,
approx 2k of data per tweet).  One bucket is storing the non-indexed tweets
in full.  Another bucket is storing the indexed tweet string, id, date and
username.  A maximum of 20 clients can be hitting the 'cluster' at any one 
time.

I'm using n_val=2 so there is replication going on behind the scenes.

I'm using a hardware load-balancer to distribute the work amongst the two
nodes and now I'm seeing about 75% CPU usage as opposed to 100% on one node
and 50% on the replicating-only node.

I've monitored the VM over the last few days and it seems to be mostly
CPU-bound.  The disk I/O is low.  The Network I/O is low.

Q: Can I change the pre-commit to a post-commit trigger or something perhaps
or will that make any difference at all?  I'm ok if the tweet stuff doesn't
get indexed immediately and there's a slight lag in indexing if it saves on 
CPU.




___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: speeding up riaksearch precommit indexing

2011-06-18 Thread John D. Rowell
The "real" queues like HornetQ and others can take care of this without a
single point of failure but it's a pain (in my opinion) to set them up that
way, and usually with all the cluster and failover features active they get
quite slow for writes.We use Redis for this because it's simpler and
lightweight. The problem is that there is no real clustering option for
Redis today, even thought there are some hacks that get close. When we
cannot afford a single point of failure or any downtime, we tend to use
MongoDB for simple queues. It has full cluster support and the performance
is pretty close to what you get with Redis in this use case.

OTOH you could keep it all Riak and setup a separate small cluster with a
RAM backend and use that as a queue, probably with similar performance. The
idea here is that you can scale these clusters (the "queue" and the indexed
production data) independently in response to your load patterns, and have
optimum hardware and I/O specs for the different cluster nodes.

-jd

2011/6/18 Les Mikesell 

> Is there a good way to handle something like this with redundancy all the
> way through?  On simple key/value items you could have two readers write the
> same things to riak and let bitcask cleanup eventually discard one, but with
> indexing you probably need to use some sort of failover approach up front.
>  Do any of those queue managers handle that without adding their own single
> point of failure?  Assuming there are unique identifiers in the items being
> written, you might use the CAS feature of redis to arbitrate writes into its
> queue, but what happens when the redis node fails?
>
>  -Les
>
>
>
> On 6/17/11 11:48 PM, John D. Rowell wrote:
>
>> Why not decouple the twitter stream processing from the indexing? More
>> than
>> likely you have a single process consuming the spritzer stream, so you can
>> put
>> the fetched results in a queue (hornetq, beanstalk, or even a simple Redis
>> queue) and then have workers pull from the queue and insert into Riak. You
>> could
>> run one worker per node and thus insert in parallel into all nodes. If you
>> need
>> free CPU (e.g. for searches), just throttle the workers to some sane
>> level. If
>> you see the queue getting bigger, add another Riak node (and thus another
>> local
>> worker).
>>
>> -jd
>>
>> 2011/6/13 Steve Webb mailto:sw...@gnip.com>>
>>
>>
>>Ok, I've changed my two VMs to each have:
>>
>>3 CPUs, 1GB ram, 120GB disk
>>
>>I'm ingesting the twitter spritzer stream (about 10-20 tweets per
>> second,
>>approx 2k of data per tweet).  One bucket is storing the non-indexed
>> tweets
>>in full.  Another bucket is storing the indexed tweet string, id, date
>> and
>>username.  A maximum of 20 clients can be hitting the 'cluster' at any
>> one time.
>>
>>I'm using n_val=2 so there is replication going on behind the scenes.
>>
>>I'm using a hardware load-balancer to distribute the work amongst the
>> two
>>nodes and now I'm seeing about 75% CPU usage as opposed to 100% on one
>> node
>>and 50% on the replicating-only node.
>>
>>I've monitored the VM over the last few days and it seems to be mostly
>>CPU-bound.  The disk I/O is low.  The Network I/O is low.
>>
>>Q: Can I change the pre-commit to a post-commit trigger or something
>> perhaps
>>or will that make any difference at all?  I'm ok if the tweet stuff
>> doesn't
>>get indexed immediately and there's a slight lag in indexing if it
>> saves on CPU.
>>
>>
>>
>>
>> __**_
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/**mailman/listinfo/riak-users_**lists.basho.com
>>
>
>
> __**_
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/**mailman/listinfo/riak-users_**lists.basho.com
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Riak crash on 0.14.2 riak_kv_stat terminating

2011-06-18 Thread Jeremy Raymond
Hello,

I have a 3 node Riak 0.14.2 cluster from the deb packages running on Ubuntu
10.10. I had a node go down with the following error from the
sasl-error.log. Ideas on tracking down the cause?

- Jeremy


=ERROR REPORT 17-Jun-2011::16:26:46 ===
** Generic server riak_kv_stat terminating
** Last message in was {'$gen_cast',{update,vnode_get,63475547206}}
** When Server state == {state,
{spiral,63475547266,

[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,

 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},
{spiral,63475547266,

[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,

 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},
2846696,42859,35845,31080,
{slide,63475532874,60,60,

"/tmp/riak/slide-data/474/1308.328066.57288",
{file_descriptor,prim_file,
{#Port<0.64895>,24}},
63475547205},
{slide,63475532874,60,60,

"/tmp/riak/slide-data/474/1308.328066.59517",
{file_descriptor,prim_file,
{#Port<0.64896>,13}},
63475547205},
{spiral,63475547266,

[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,

 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},
978,4,
{spiral,63475547266,

[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,

 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},
21375,21}
** Reason for termination ==
** {badarg,[{erlang,hd,[[]]},
{spiraltime,incr,3},
{riak_kv_stat,spiral_incr,3},
{riak_kv_stat,handle_cast,2},
{gen_server2,handle_msg,7},
{proc_lib,init_p_do_apply,3}]}

=CRASH REPORT 17-Jun-2011::16:26:46 ===
  crasher:
initial call: gen:init_it/7
pid: <0.151.0>
registered_name: riak_kv_stat
exception exit:
{badarg,[{erlang,hd,[[]]},{spiraltime,incr,3},{riak_kv_stat,spiral_incr,3},{riak_kv_stat,handle_cast,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
  in function  gen_server2:terminate/6
  in call from proc_lib:init_p_do_apply/3
ancestors: [riak_kv_sup,<0.138.0>]
messages:
[{$gen_cast,{update,vnode_get,63475547206}},{$gen_cast,{update,vnode_get,63475547206}},{$gen_cast,{update,vnode_get,63475547206}},{$gen_cast,{update,vnode_get,63475547206}},{$gen_cast,{update,vnode_get,63475547206}},{$gen_cast,{update,vnode_get,63475547206}},{$gen_cast,{update,vnode_get,63475547206}},{$gen_cast,{update,vnode_get,63475547206}},{$gen_cast,{update,mapper_end,63475547206}},{$gen_cast,{update,vnode_get,63475547206}},{$gen_cast,{update,mapper_end,63475547206}},{$gen_cast,{update,vnode_get,63475547206}},{$gen_cast,{update,vnode_get,63475547206}},{$gen_cast,{update,vnode_get,63475547206}},{$gen_cast,{update,vnode_get,63475547206}},{$gen_cast,{update,vnode_get,63475547206}},{$gen_cast,{update,vnode_get,63475547206}},{$gen_cast,{update,vnode_get,63475547206}},{$gen_cast,{update,vnode_get,63475547206}},{$gen_cast,{update,vnode_get,63475547206}},{$gen_cast,{update,vnode_put,63475547206}}]
links: [#Port<0.64896>,<0.145.0>,#Port<0.64895>]
dictionary: []
trap_exit: true
status: running
heap_size: 2584
stack_size: 24
reductions: 81329187
  neighbours:

=SUPERVISOR REPORT 17-Jun-2011::16:26:46 ===
 Supervisor: {local,riak_kv_sup}
 Context:child_terminated
 Reason:
{badarg,[{erlang,hd,[[]]},{spiraltime,incr,3},{riak_kv_stat,spiral_incr,3},{riak_kv_stat,handle_cast,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
 Offender:
[{pid,<0.151.0>},{name,riak_kv_stat},{mfa,{riak_kv_stat,start_link,[]}},{restart_type,permanent},{shutdown,5000},{child_type,worker}]
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com