Re: Cassandra w/ Hadoop

Mark Thu, 19 Aug 2010 11:15:27 -0700

 On 8/19/10 10:23 AM, Jeremy Hanna wrote:

I would check out http://wiki.apache.org/cassandra/HadoopSupport for more info. 
 I'll try to explain a bit more here, but I don't think there's a tutorial out 
there yet.


For input:
- configure your main class where you're starting the mapreduce job the way the 
word_count is configured (with either storage-conf or in your code via the 
ConfigHelper).  It will complain specifically about stuff you hadn't configured 
- esp. important is your cassandra server and port.
- the inputs to your mapper are going to be what's coming from cassandra - so 
your key with a map of row values
- you need to set your column name in your overridden setup method in your 
mapper
- for the reducer, nothing really changes from a normal map/reduce, unless you 
want to output to cassandra
- generally cassandra just provides an inputformat and split classes to read 
from cassandra - you can find the guts in the org.apache.cassandra.hadoop 
package

For output:
- in your reducer, you could just write to cassandra directly via thrift.  
there is a built-in outputformat coming in 0.7 but it still might change before 
0.7 final - that will queue up changes so it will write large blocks all at 
once.


On Aug 19, 2010, at 12:07 PM, Mark wrote:

Are there any examples/tutorials on the web for reading/writing from Cassandra 
into/from Hadoop?

I found the example in contrib/word_count but I really can't make sense of 
it... a tutorial/explanation would help.

Thanks!

Re: Cassandra w/ Hadoop

Reply via email to