Is there a good way to handle something like this with redundancy all the way
through? On simple key/value items you could have two readers write the same
things to riak and let bitcask cleanup eventually discard one, but with indexing
you probably need to use some sort of failover approach up front. Do any of
those queue managers handle that without adding their own single point of
failure? Assuming there are unique identifiers in the items being written, you
might use the CAS feature of redis to arbitrate writes into its queue, but what
happens when the redis node fails?
-Les
On 6/17/11 11:48 PM, John D. Rowell wrote:
Why not decouple the twitter stream processing from the indexing? More than
likely you have a single process consuming the spritzer stream, so you can put
the fetched results in a queue (hornetq, beanstalk, or even a simple Redis
queue) and then have workers pull from the queue and insert into Riak. You could
run one worker per node and thus insert in parallel into all nodes. If you need
free CPU (e.g. for searches), just throttle the workers to some sane level. If
you see the queue getting bigger, add another Riak node (and thus another local
worker).
-jd
2011/6/13 Steve Webb <sw...@gnip.com <mailto:sw...@gnip.com>>
Ok, I've changed my two VMs to each have:
3 CPUs, 1GB ram, 120GB disk
I'm ingesting the twitter spritzer stream (about 10-20 tweets per second,
approx 2k of data per tweet). One bucket is storing the non-indexed tweets
in full. Another bucket is storing the indexed tweet string, id, date and
username. A maximum of 20 clients can be hitting the 'cluster' at any one
time.
I'm using n_val=2 so there is replication going on behind the scenes.
I'm using a hardware load-balancer to distribute the work amongst the two
nodes and now I'm seeing about 75% CPU usage as opposed to 100% on one node
and 50% on the replicating-only node.
I've monitored the VM over the last few days and it seems to be mostly
CPU-bound. The disk I/O is low. The Network I/O is low.
Q: Can I change the pre-commit to a post-commit trigger or something perhaps
or will that make any difference at all? I'm ok if the tweet stuff doesn't
get indexed immediately and there's a slight lag in indexing if it saves on
CPU.
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com