On 8/19/10 10:34 AM, Christian Decker wrote:
If, like me, you prefer to write your jobs on the fly try taking a look at Pig. Cassandra provides a loadfunc under contrib/pig/ in the source package which allows you to load data directly from Cassandra.
--
Christian Decker
Software Architect
http://blog.snyke.net


On Thu, Aug 19, 2010 at 7:23 PM, Jeremy Hanna <jeremy.hanna1...@gmail.com <mailto:jeremy.hanna1...@gmail.com>> wrote:

    I would check out http://wiki.apache.org/cassandra/HadoopSupport
    for more info.  I'll try to explain a bit more here, but I don't
    think there's a tutorial out there yet.

    For input:
    - configure your main class where you're starting the mapreduce
    job the way the word_count is configured (with either storage-conf
    or in your code via the ConfigHelper).  It will complain
    specifically about stuff you hadn't configured - esp. important is
    your cassandra server and port.
    - the inputs to your mapper are going to be what's coming from
    cassandra - so your key with a map of row values
    - you need to set your column name in your overridden setup method
    in your mapper
    - for the reducer, nothing really changes from a normal
    map/reduce, unless you want to output to cassandra
    - generally cassandra just provides an inputformat and split
    classes to read from cassandra - you can find the guts in the
    org.apache.cassandra.hadoop package

    For output:
    - in your reducer, you could just write to cassandra directly via
    thrift.  there is a built-in outputformat coming in 0.7 but it
    still might change before 0.7 final - that will queue up changes
    so it will write large blocks all at once.


    On Aug 19, 2010, at 12:07 PM, Mark wrote:

    > Are there any examples/tutorials on the web for reading/writing
    from Cassandra into/from Hadoop?
    >
    > I found the example in contrib/word_count but I really can't
    make sense of it... a tutorial/explanation would help.


That's definitely an option and I'll probably lean towards that in the near future. I am just trying to get a complete understanding of the whole infrastructure before working with higher level features.

Also same problem exists... I need a nice tutorial :)

Reply via email to