Try this one then, it reads from cassandra, then writes back to cassandra, but you could change the write to where ever you would like.
getConf().set(IN_COLUMN_NAME, columnName ); Job job = new Job(getConf(), "ProcessRawXml"); job.setInputFormatClass(ColumnFamilyInputFormat.class); job.setNumReduceTasks(0); job.setJarByClass(StartJob.class); job.setMapperClass(ParseMapper.class); job.setOutputKeyClass(ByteBuffer.class); //job.setOutputValueClass(Text.class); job.setOutputFormatClass(ColumnFamilyOutputFormat.class); ConfigHelper.setOutputColumnFamily(job.getConfiguration(), KEYSPACE, COLUMN_FAMILY); job.setInputFormatClass(ColumnFamilyInputFormat.class); ConfigHelper.setRpcPort(job.getConfiguration(), "9160"); //org.apache.cassandra.dht.LocalPartitioner ConfigHelper.setInitialAddress(job.getConfiguration(), "localhost"); ConfigHelper.setPartitioner(job.getConfiguration(), "org.apache.cassandra.dht.RandomPartitioner"); ConfigHelper.setInputColumnFamily(job.getConfiguration(), KEYSPACE, COLUMN_FAMILY); SlicePredicate predicate = new SlicePredicate().setColumn_names(Arrays.asList(ByteBufferUtil.bytes(columnN ame))); // SliceRange slice_range = new SliceRange(); // slice_range.setStart(ByteBufferUtil.bytes(startPoint)); // slice_range.setFinish(ByteBufferUtil.bytes(endPoint)); // // predicate.setSlice_range(slice_range); ConfigHelper.setInputSlicePredicate(job.getConfiguration(), predicate); job.waitForCompletion(true); ts/jobs/ProcessXml/ On 1/16/13 9:22 AM, "" <> wrote: >I don't want to write to Cassandra as it replicates data from another >datacenter, but I just want to use Hadoop Jobs (Pig and Hive) to read >data from it. I would like to use the same configuration as 

 but I want to know if there are alternatives to DataStax Enterprise package.

Thanks
On Jan 16, 2013, at 3:59 PM, James Schappet <> wrote:

 Here are a few examples I have worked on, reading from xml.gz files then writing to cassandara.




 You will also need:




 These examples are Hadoop Jobs using Cassandra as the Data Store.

 This one is a good place to start.


ic
ts/jobs/LoadMedline/

 ConfigHelper.setInputColumnFamily(job.getConfiguration(), KEYSPACE, COLUMN_FAMILY);
 ConfigHelper.setOutputColumnFamily(job.getConfiguration(), KEYSPACE, outputPath);

 job.setMapperClass(MapperToCassandra.class);
 job.setOutputKeyClass(Text.class);
 job.setOutputValueClass(Text.class);

"Writing output to Cassandra");
 //job.setReducerClass(ReducerToCassandra.class);
 job.setOutputFormatClass(ColumnFamilyOutputFormat.class);

 ConfigHelper.setRpcPort(job.getConfiguration(), "9160");
 //org.apache.cassandra.dht.LocalPartitioner
 ConfigHelper.setInitialAddress(job.getConfiguration(), "localhost");
 ConfigHelper.setPartitioner(job.getConfiguration(), "org.apache.cassandra.dht.RandomPartitioner");




 On 1/16/13 7:37 AM, "" <> wrote:

 Hi,

 I know that DataStax Enterprise package provide Brisk, but is there a community version ? Is it easy to interface Hadoop with Cassandra as the storage or do we absolutely have to use Brisk for that ?
 I know CassandraFS is natively available in cassandra 1.2, the version I use, so is there a way/procedure to interface hadoop with Cassandra as the storage ?

 Thanks 