If, like me, you prefer to write your jobs on the fly try taking a look at Pig. Cassandra provides a loadfunc under contrib/pig/ in the source package which allows you to load data directly from Cassandra. -- Christian Decker Software Architect http://blog.snyke.net
On Thu, Aug 19, 2010 at 7:23 PM, Jeremy Hanna <jeremy.hanna1...@gmail.com>wrote: > I would check out http://wiki.apache.org/cassandra/HadoopSupport for more > info. I'll try to explain a bit more here, but I don't think there's a > tutorial out there yet. > > For input: > - configure your main class where you're starting the mapreduce job the way > the word_count is configured (with either storage-conf or in your code via > the ConfigHelper). It will complain specifically about stuff you hadn't > configured - esp. important is your cassandra server and port. > - the inputs to your mapper are going to be what's coming from cassandra - > so your key with a map of row values > - you need to set your column name in your overridden setup method in your > mapper > - for the reducer, nothing really changes from a normal map/reduce, unless > you want to output to cassandra > - generally cassandra just provides an inputformat and split classes to > read from cassandra - you can find the guts in the > org.apache.cassandra.hadoop package > > For output: > - in your reducer, you could just write to cassandra directly via thrift. > there is a built-in outputformat coming in 0.7 but it still might change > before 0.7 final - that will queue up changes so it will write large blocks > all at once. > > > On Aug 19, 2010, at 12:07 PM, Mark wrote: > > > Are there any examples/tutorials on the web for reading/writing from > Cassandra into/from Hadoop? > > > > I found the example in contrib/word_count but I really can't make sense > of it... a tutorial/explanation would help. > >