RE: Map/Reduce Cassandra Output

Stu Hood Mon, 19 Apr 2010 11:20:38 -0700

If you used that snippet of code, all connections would go through the same 
seed: the input code does additional work to determine which nodes are holding 
particular key ranges, and then connects directly.


----

For outputting from Hadoop to Cassandra, you may want to consider using a Java 
client like Hector, which will handle the load balancing for you.

http://github.com/rantav/hector

Thanks,
Stu

-----Original Message-----
From: "Sonny Heer" <sonnyh...@gmail.com>
Sent: Monday, April 19, 2010 11:29am
To: cassandra-u...@incubator.apache.org
Subject: Map/Reduce Cassandra Output

Different from the wordcount my input source is a directory, and I
have the a split class and record reader defined.

Different from wordcount during reduce I need to insert into
Cassandra.  I notice for the wordcount input it retrieves a handle on
a cassandra client like this:

        TSocket socket = new
TSocket(DatabaseDescriptor.getSeeds().iterator().next().getHostAddress(),
                                     DatabaseDescriptor.getThriftPort());
        TBinaryProtocol binaryProtocol = new TBinaryProtocol(socket,
false, false);
        Cassandra.Client client = new Cassandra.Client(binaryProtocol);

Would all hadoop nodes go to the same seed if i use this code to
insert data, without balancing it?  Has this been done somewhere in
the Cassandra code already?

RE: Map/Reduce Cassandra Output

Reply via email to