Different from the wordcount my input source is a directory, and I have the a split class and record reader defined.
Different from wordcount during reduce I need to insert into Cassandra. I notice for the wordcount input it retrieves a handle on a cassandra client like this: TSocket socket = new TSocket(DatabaseDescriptor.getSeeds().iterator().next().getHostAddress(), DatabaseDescriptor.getThriftPort()); TBinaryProtocol binaryProtocol = new TBinaryProtocol(socket, false, false); Cassandra.Client client = new Cassandra.Client(binaryProtocol); Would all hadoop nodes go to the same seed if i use this code to insert data, without balancing it? Has this been done somewhere in the Cassandra code already?