Try this one then, it reads from cassandra, then writes back to cassandra, but you could change the write to where ever you would like.
getConf().set(IN_COLUMN_NAME, columnName ); Job job = new Job(getConf(), "ProcessRawXml"); job.setInputFormatClass(ColumnFamilyInputFormat.class); job.setNumReduceTasks(0); job.setJarByClass(StartJob.class); job.setMapperClass(ParseMapper.class); job.setOutputKeyClass(ByteBuffer.class); //job.setOutputValueClass(Text.class); job.setOutputFormatClass(ColumnFamilyOutputFormat.class); ConfigHelper.setOutputColumnFamily(job.getConfiguration(), KEYSPACE, COLUMN_FAMILY); job.setInputFormatClass(ColumnFamilyInputFormat.class); ConfigHelper.setRpcPort(job.getConfiguration(), "9160"); //org.apache.cassandra.dht.LocalPartitioner ConfigHelper.setInitialAddress(job.getConfiguration(), "localhost"); ConfigHelper.setPartitioner(job.getConfiguration(), "org.apache.cassandra.dht.RandomPartitioner"); ConfigHelper.setInputColumnFamily(job.getConfiguration(), KEYSPACE, COLUMN_FAMILY); SlicePredicate predicate = new SlicePredicate().setColumn_names(Arrays.asList(ByteBufferUtil.bytes(columnN ame))); // SliceRange slice_range = new SliceRange(); // slice_range.setStart(ByteBufferUtil.bytes(startPoint)); // slice_range.setFinish(ByteBufferUtil.bytes(endPoint)); // // predicate.setSlice_range(slice_range); ConfigHelper.setInputSlicePredicate(job.getConfiguration(), predicate); job.waitForCompletion(true); https://github.com/jschappet/medline/blob/master/src/main/java/edu/uiowa/ic ts/jobs/ProcessXml/StartJob.java On 1/16/13 9:22 AM, "cscetbon....@orange.com" <cscetbon....@orange.com> wrote: >I don't want to write to Cassandra as it replicates data from another >datacenter, but I just want to use Hadoop Jobs (Pig and Hive) to read >data from it. I would like to use the same configuration as >http://www.datastax.com/dev/blog/hadoop-mapreduce-in-the-cassandra-cluster > but I want to know if there are alternatives to DataStax Enterprise >package. > >Thanks >On Jan 16, 2013, at 3:59 PM, James Schappet <jschap...@gmail.com> wrote: > >> Here are a few examples I have worked on, reading from xml.gz files then >> writing to cassandara. >> >> >> https://github.com/jschappet/medline >> >> You will also need: >> >> https://github.com/jschappet/medline-base >> >> >> >> These examples are Hadoop Jobs using Cassandra as the Data Store. >> >> This one is a good place to start. >> >>https://github.com/jschappet/medline/blob/master/src/main/java/edu/uiowa/ >>ic >> ts/jobs/LoadMedline/StartJob.java >> >> ConfigHelper.setInputColumnFamily(job.getConfiguration(), KEYSPACE, >> COLUMN_FAMILY); >> ConfigHelper.setOutputColumnFamily(job.getConfiguration(), KEYSPACE, >> outputPath); >> >> job.setMapperClass(MapperToCassandra.class); >> job.setOutputKeyClass(Text.class); >> job.setOutputValueClass(Text.class); >> >> LOG.info("Writing output to Cassandra"); >> //job.setReducerClass(ReducerToCassandra.class); >> job.setOutputFormatClass(ColumnFamilyOutputFormat.class); >> >> ConfigHelper.setRpcPort(job.getConfiguration(), "9160"); >> //org.apache.cassandra.dht.LocalPartitioner >> ConfigHelper.setInitialAddress(job.getConfiguration(), >> "localhost"); >> ConfigHelper.setPartitioner(job.getConfiguration(), >> "org.apache.cassandra.dht.RandomPartitioner"); >> >> >> >> >> >> >> On 1/16/13 7:37 AM, "cscetbon....@orange.com" <cscetbon....@orange.com> >> wrote: >> >>> Hi, >>> >>> I know that DataStax Enterprise package provide Brisk, but is there a >>> community version ? Is it easy to interface Hadoop with Cassandra as >>>the >>> storage or do we absolutely have to use Brisk for that ? >>> I know CassandraFS is natively available in cassandra 1.2, the version >>>I >>> use, so is there a way/procedure to interface hadoop with Cassandra as >>> the storage ? >>> >>> Thanks >>> >>>________________________________________________________________________ >>>__ >>> _______________________________________________ >>> >>> Ce message et ses pieces jointes peuvent contenir des informations >>> confidentielles ou privilegiees et ne doivent donc >>> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez >>> recu ce message par erreur, veuillez le signaler >>> a l'expediteur et le detruire ainsi que les pieces jointes. Les >>>messages >>> electroniques etant susceptibles d'alteration, >>> France Telecom - Orange decline toute responsabilite si ce message a >>>ete >>> altere, deforme ou falsifie. Merci. >>> >>> This message and its attachments may contain confidential or privileged >>> information that may be protected by law; >>> they should not be distributed, used or copied without authorisation. >>> If you have received this email in error, please notify the sender and >>> delete this message and its attachments. >>> As emails may be altered, France Telecom - Orange is not liable for >>> messages that have been modified, changed or falsified. >>> Thank you. >>> >> >> > > >__________________________________________________________________________ >_______________________________________________ > >Ce message et ses pieces jointes peuvent contenir des informations >confidentielles ou privilegiees et ne doivent donc >pas etre diffuses, exploites ou copies sans autorisation. Si vous avez >recu ce message par erreur, veuillez le signaler >a l'expediteur et le detruire ainsi que les pieces jointes. Les messages >electroniques etant susceptibles d'alteration, >France Telecom - Orange decline toute responsabilite si ce message a ete >altere, deforme ou falsifie. Merci. > >This message and its attachments may contain confidential or privileged >information that may be protected by law; >they should not be distributed, used or copied without authorisation. >If you have received this email in error, please notify the sender and >delete this message and its attachments. >As emails may be altered, France Telecom - Orange is not liable for >messages that have been modified, changed or falsified. >Thank you. >