Re: Cassandra w/ Hadoop

Christian Decker Thu, 19 Aug 2010 10:35:23 -0700

If, like me, you prefer to write your jobs on the fly try taking a look at
Pig. Cassandra provides a loadfunc under contrib/pig/ in the source package
which allows you to load data directly from Cassandra.
--
Christian Decker
Software Architect
http://blog.snyke.net



On Thu, Aug 19, 2010 at 7:23 PM, Jeremy Hanna <jeremy.hanna1...@gmail.com>wrote:

> I would check out http://wiki.apache.org/cassandra/HadoopSupport for more
> info.  I'll try to explain a bit more here, but I don't think there's a
> tutorial out there yet.
>
> For input:
> - configure your main class where you're starting the mapreduce job the way
> the word_count is configured (with either storage-conf or in your code via
> the ConfigHelper).  It will complain specifically about stuff you hadn't
> configured - esp. important is your cassandra server and port.
> - the inputs to your mapper are going to be what's coming from cassandra -
> so your key with a map of row values
> - you need to set your column name in your overridden setup method in your
> mapper
> - for the reducer, nothing really changes from a normal map/reduce, unless
> you want to output to cassandra
> - generally cassandra just provides an inputformat and split classes to
> read from cassandra - you can find the guts in the
> org.apache.cassandra.hadoop package
>
> For output:
> - in your reducer, you could just write to cassandra directly via thrift.
>  there is a built-in outputformat coming in 0.7 but it still might change
> before 0.7 final - that will queue up changes so it will write large blocks
> all at once.
>
>
> On Aug 19, 2010, at 12:07 PM, Mark wrote:
>
> > Are there any examples/tutorials on the web for reading/writing from
> Cassandra into/from Hadoop?
> >
> > I found the example in contrib/word_count but I really can't make sense
> of it... a tutorial/explanation would help.
>
>

Re: Cassandra w/ Hadoop

Reply via email to