Re: speeding up riaksearch precommit indexing

Les Mikesell Sat, 18 Jun 2011 10:09:29 -0700

Is there a good way to handle something like this with redundancy all the waythrough? On simple key/value items you could have two readers write the samethings to riak and let bitcask cleanup eventually discard one, but with indexingyou probably need to use some sort of failover approach up front. Do any ofthose queue managers handle that without adding their own single point offailure? Assuming there are unique identifiers in the items being written, youmight use the CAS feature of redis to arbitrate writes into its queue, but whathappens when the redis node fails?


  -Les



On 6/17/11 11:48 PM, John D. Rowell wrote:

Why not decouple the twitter stream processing from the indexing? More than
likely you have a single process consuming the spritzer stream, so you can put
the fetched results in a queue (hornetq, beanstalk, or even a simple Redis
queue) and then have workers pull from the queue and insert into Riak. You could
run one worker per node and thus insert in parallel into all nodes. If you need
free CPU (e.g. for searches), just throttle the workers to some sane level. If
you see the queue getting bigger, add another Riak node (and thus another local
worker).

-jd

2011/6/13 Steve Webb <sw...@gnip.com <mailto:sw...@gnip.com>>

    Ok, I've changed my two VMs to each have:

    3 CPUs, 1GB ram, 120GB disk

    I'm ingesting the twitter spritzer stream (about 10-20 tweets per second,
    approx 2k of data per tweet).  One bucket is storing the non-indexed tweets
    in full.  Another bucket is storing the indexed tweet string, id, date and
    username.  A maximum of 20 clients can be hitting the 'cluster' at any one 
time.

    I'm using n_val=2 so there is replication going on behind the scenes.

    I'm using a hardware load-balancer to distribute the work amongst the two
    nodes and now I'm seeing about 75% CPU usage as opposed to 100% on one node
    and 50% on the replicating-only node.

    I've monitored the VM over the last few days and it seems to be mostly
    CPU-bound.  The disk I/O is low.  The Network I/O is low.

    Q: Can I change the pre-commit to a post-commit trigger or something perhaps
    or will that make any difference at all?  I'm ok if the tweet stuff doesn't
    get indexed immediately and there's a slight lag in indexing if it saves on 
CPU.




_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: speeding up riaksearch precommit indexing

Reply via email to