Re: speeding up riaksearch precommit indexing

2011-06-22 Thread Mathias Meyer
Les, maybe it's worth looking into Beetle [1] which is a HA messaging solution built on RabbitMQ and Redis. It supports multiple brokers and message de-duplication, using Redis. It's written in Ruby, but should either way give you some inspiration on how something like this could be achieved.

Re: speeding up riaksearch precommit indexing

2011-06-21 Thread Les Mikesell
I'd like to have fully redundant feeds with no single point of failure, but avoid the work of indexing the duplicate copy and having it written to a bitcask even if it would eventually be cleaned up. On 6/21/2011 4:43 PM, Sylvain Niles wrote: Why not write to a queue bucket with a timestamp a

Re: speeding up riaksearch precommit indexing

2011-06-21 Thread Sylvain Niles
Why not write to a queue bucket with a timestamp and have a queue processor move writes to the "final" bucket once they're over a certain age? It can dedup/validate at that point too. On Tue, Jun 21, 2011 at 2:26 PM, Les Mikesell wrote: > Where can I find the redis hacks that get close to cluste

Re: speeding up riaksearch precommit indexing

2011-06-21 Thread Les Mikesell
Where can I find the redis hacks that get close to clustering? Would membase work with syncronous replication on a pair of nodes for a reliable atomic 'check and set' operation to dedup redundant data before writing to riak? Conceptually I like the 'smart client' fault tolerance of memcache/

Re: speeding up riaksearch precommit indexing

2011-06-18 Thread John D. Rowell
The "real" queues like HornetQ and others can take care of this without a single point of failure but it's a pain (in my opinion) to set them up that way, and usually with all the cluster and failover features active they get quite slow for writes.We use Redis for this because it's simpler and ligh

Re: speeding up riaksearch precommit indexing

2011-06-18 Thread Les Mikesell
Is there a good way to handle something like this with redundancy all the way through? On simple key/value items you could have two readers write the same things to riak and let bitcask cleanup eventually discard one, but with indexing you probably need to use some sort of failover approach up

Re: speeding up riaksearch precommit indexing

2011-06-17 Thread John D. Rowell
Why not decouple the twitter stream processing from the indexing? More than likely you have a single process consuming the spritzer stream, so you can put the fetched results in a queue (hornetq, beanstalk, or even a simple Redis queue) and then have workers pull from the queue and insert into Riak

Re: speeding up riaksearch precommit indexing

2011-06-15 Thread Rusty Klophaus
Hi Steve, Thanks for sending over more details. The pre- vs. post-commit hook question is a good one. The reason we chose a pre-commit hook over a post-commit hook for Riak Search indexing is because a post commit hook doesn't currently provide back-pressure to the Riak KV side of the system. It

Re: speeding up riaksearch precommit indexing

2011-06-13 Thread Steve Webb
Ok, I've changed my two VMs to each have: 3 CPUs, 1GB ram, 120GB disk I'm ingesting the twitter spritzer stream (about 10-20 tweets per second, approx 2k of data per tweet). One bucket is storing the non-indexed tweets in full. Another bucket is storing the indexed tweet string, id, date an

Re: speeding up riaksearch precommit indexing

2011-06-09 Thread Rusty Klophaus
Hi Steve, Riak does best with a lot of memory and a fast disk. Depending on how much data you have in the system, putting two nodes into 1GB of memory on a single VM may be causing the system to overrun available resources and page out to disk, and depending on how you've set up your virtualized e

speeding up riaksearch precommit indexing

2011-06-07 Thread Steve Webb
Hey there. I'm inserting twitter spritzer tweets into a bucket that doesn't have a precommit index hook, and a few fields from the tweet into a second bucket that does have the precommit hook. Speeds on the inserts into the indexed bucket are an order or magnitude slower than the non-indexed