Re: Pig + Cassandra = Connection errors

2010-08-18 Thread Drew Dahlke
What's your cassandra timeout configured to? It's not uncommon to raise that to 30sec if you're getting timeouts. On Wed, Aug 18, 2010 at 8:17 AM, Christian Decker wrote: > Hi all, > I'm trying to get Pig scripts to work on data in Cassandra and right now I > want to simply run the example-script

Re: Map/Reduce over Cassandra

2010-08-18 Thread Drew Dahlke
Hey Bill, A few months ago we did an experiment with 5 hadoop nodes pulling from 4 cass nodes. It was pulling down 1 column family with 8 small columns & just dumping the raw data to hdfs. It was cycling through around 17K map tasks per sec. The machines weren't being taxed too hard, so I'm sure t

Re: Cassandra vs MongoDB

2010-07-27 Thread Drew Dahlke
There's a good post on stackoverflow comparing the two http://stackoverflow.com/questions/2892729/mongodb-vs-cassandra It seems to me that both projects have pretty vibrant communities behind them. On Tue, Jul 27, 2010 at 11:14 AM, Mark wrote: > Can someone quickly explain the differences betwee

Re: Map Reduce support

2010-06-28 Thread Drew Dahlke
I'm afraid I didn't hold on to it, sorry folks On Mon, Jun 28, 2010 at 8:58 AM, Carlos Sanchez wrote: > Drew, > > I was wondering if you care to share your map-reduce code > > Thanks > > Carlos > ________ > From: Drew Dahlke [d

Re: Map Reduce support

2010-06-28 Thread Drew Dahlke
them though in some time. > > On Fri, Jun 25, 2010 at 5:46 PM, Drew Dahlke wrote: >> >> The cassandra column family input format will go over a an entire >> column family sending a slice of a row into a mapper at a time. From >> there there's a lot you can d

Re: Map Reduce support

2010-06-25 Thread Drew Dahlke
The cassandra column family input format will go over a an entire column family sending a slice of a row into a mapper at a time. From there there's a lot you can do. As far as how you aggregate data together, I'd suggest experimenting with the latest version of Pig which thankfully supports the ne

Cassandra timeouts under low load

2010-06-15 Thread Drew Dahlke
Hi, I'm running cassandra .6.2 on a dedicated 4 node cluster and I also have a dedicated 4 node hadoop cluster. I'm trying to run a simple map reduce job against a single column family and it only takes 32 map tasks before I get floods of thrift timeouts. That would make sense to me if the cassandr