Re: Cassandra w/ Hadoop

Mark Thu, 19 Aug 2010 18:59:20 -0700

 On 8/19/10 11:14 AM, Mark wrote:

 On 8/19/10 10:23 AM, Jeremy Hanna wrote:
I would check out http://wiki.apache.org/cassandra/HadoopSupport formore info. I'll try to explain a bit more here, but I don't thinkthere's a tutorial out there yet.
For input:
- configure your main class where you're starting the mapreduce jobthe way the word_count is configured (with either storage-conf or inyour code via the ConfigHelper). It will complain specifically aboutstuff you hadn't configured - esp. important is your cassandra serverand port.- the inputs to your mapper are going to be what's coming fromcassandra - so your key with a map of row values- you need to set your column name in your overridden setup method inyour mapper- for the reducer, nothing really changes from a normal map/reduce,unless you want to output to cassandra- generally cassandra just provides an inputformat and split classesto read from cassandra - you can find the guts in theorg.apache.cassandra.hadoop package
For output:
- in your reducer, you could just write to cassandra directly viathrift. there is a built-in outputformat coming in 0.7 but it stillmight change before 0.7 final - that will queue up changes so it willwrite large blocks all at once.
On Aug 19, 2010, at 12:07 PM, Mark wrote:
Are there any examples/tutorials on the web for reading/writing fromCassandra into/from Hadoop?
I found the example in contrib/word_count but I really can't makesense of it... a tutorial/explanation would help.
Thanks!

How does batching across all rows work? Does it just take an arbitrarystart w/ a limit of x and then use the last key from that result as thenext start? Does this work with RandomPartitioner?

Re: Cassandra w/ Hadoop

Reply via email to