Try this one then, it reads from cassandra, then writes back to cassandra,
but you could change the write to where ever you would like.



   getConf().set(IN_COLUMN_NAME, columnName );

                        Job job = new Job(getConf(), "ProcessRawXml");
            job.setInputFormatClass(ColumnFamilyInputFormat.class);
                    job.setNumReduceTasks(0);

            job.setJarByClass(StartJob.class);
            job.setMapperClass(ParseMapper.class);
            job.setOutputKeyClass(ByteBuffer.class);
            //job.setOutputValueClass(Text.class);
                job.setOutputFormatClass(ColumnFamilyOutputFormat.class);

            ConfigHelper.setOutputColumnFamily(job.getConfiguration(),
KEYSPACE, COLUMN_FAMILY);
            job.setInputFormatClass(ColumnFamilyInputFormat.class);
            ConfigHelper.setRpcPort(job.getConfiguration(), "9160");
            //org.apache.cassandra.dht.LocalPartitioner
                ConfigHelper.setInitialAddress(job.getConfiguration(),
"localhost");
                ConfigHelper.setPartitioner(job.getConfiguration(),
"org.apache.cassandra.dht.RandomPartitioner");
                ConfigHelper.setInputColumnFamily(job.getConfiguration(),
KEYSPACE, COLUMN_FAMILY);


                SlicePredicate predicate = new
SlicePredicate().setColumn_names(Arrays.asList(ByteBufferUtil.bytes(columnN
ame)));
//              SliceRange slice_range = new SliceRange();
//              slice_range.setStart(ByteBufferUtil.bytes(startPoint));
//              slice_range.setFinish(ByteBufferUtil.bytes(endPoint));
//              
//              predicate.setSlice_range(slice_range);
                ConfigHelper.setInputSlicePredicate(job.getConfiguration(),
predicate);

                job.waitForCompletion(true);


https://github.com/jschappet/medline/blob/master/src/main/java/edu/uiowa/ic
ts/jobs/ProcessXml/StartJob.java








On 1/16/13 9:22 AM, "cscetbon....@orange.com" <cscetbon....@orange.com>
wrote:

>I don't want to write to Cassandra as it replicates data from another
>datacenter, but I just want to use Hadoop Jobs (Pig and Hive) to read
>data from it. I would like to use the same configuration as
>http://www.datastax.com/dev/blog/hadoop-mapreduce-in-the-cassandra-cluster
> but I want to know if there are alternatives to DataStax Enterprise
>package.
>
>Thanks
>On Jan 16, 2013, at 3:59 PM, James Schappet <jschap...@gmail.com> wrote:
>
>> Here are a few examples I have worked on, reading from xml.gz files then
>> writing to cassandara.
>> 
>> 
>> https://github.com/jschappet/medline
>> 
>> You will also need:
>> 
>> https://github.com/jschappet/medline-base
>> 
>> 
>> 
>> These examples are Hadoop Jobs using Cassandra as the Data Store.
>> 
>> This one is a good place to start.
>> 
>>https://github.com/jschappet/medline/blob/master/src/main/java/edu/uiowa/
>>ic
>> ts/jobs/LoadMedline/StartJob.java
>> 
>> ConfigHelper.setInputColumnFamily(job.getConfiguration(), KEYSPACE,
>> COLUMN_FAMILY);
>>      ConfigHelper.setOutputColumnFamily(job.getConfiguration(), KEYSPACE,
>> outputPath);
>> 
>>        job.setMapperClass(MapperToCassandra.class);
>>        job.setOutputKeyClass(Text.class);
>>        job.setOutputValueClass(Text.class);
>> 
>>      LOG.info("Writing output to Cassandra");
>>      //job.setReducerClass(ReducerToCassandra.class);
>>      job.setOutputFormatClass(ColumnFamilyOutputFormat.class);
>> 
>>        ConfigHelper.setRpcPort(job.getConfiguration(), "9160");
>>        //org.apache.cassandra.dht.LocalPartitioner
>>        ConfigHelper.setInitialAddress(job.getConfiguration(),
>> "localhost");
>>        ConfigHelper.setPartitioner(job.getConfiguration(),
>> "org.apache.cassandra.dht.RandomPartitioner");
>> 
>> 
>> 
>> 
>> 
>> 
>> On 1/16/13 7:37 AM, "cscetbon....@orange.com" <cscetbon....@orange.com>
>> wrote:
>> 
>>> Hi,
>>> 
>>> I know that DataStax Enterprise package provide Brisk, but is there a
>>> community version ? Is it easy to interface Hadoop with Cassandra as
>>>the
>>> storage or do we absolutely have to use Brisk for that ?
>>> I know CassandraFS is natively available in cassandra 1.2, the version
>>>I
>>> use, so is there a way/procedure to interface hadoop with Cassandra as
>>> the storage ?
>>> 
>>> Thanks 
>>> 
>>>________________________________________________________________________
>>>__
>>> _______________________________________________
>>> 
>>> Ce message et ses pieces jointes peuvent contenir des informations
>>> confidentielles ou privilegiees et ne doivent donc
>>> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez
>>> recu ce message par erreur, veuillez le signaler
>>> a l'expediteur et le detruire ainsi que les pieces jointes. Les
>>>messages
>>> electroniques etant susceptibles d'alteration,
>>> France Telecom - Orange decline toute responsabilite si ce message a
>>>ete
>>> altere, deforme ou falsifie. Merci.
>>> 
>>> This message and its attachments may contain confidential or privileged
>>> information that may be protected by law;
>>> they should not be distributed, used or copied without authorisation.
>>> If you have received this email in error, please notify the sender and
>>> delete this message and its attachments.
>>> As emails may be altered, France Telecom - Orange is not liable for
>>> messages that have been modified, changed or falsified.
>>> Thank you.
>>> 
>> 
>> 
>
>
>__________________________________________________________________________
>_______________________________________________
>
>Ce message et ses pieces jointes peuvent contenir des informations
>confidentielles ou privilegiees et ne doivent donc
>pas etre diffuses, exploites ou copies sans autorisation. Si vous avez
>recu ce message par erreur, veuillez le signaler
>a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
>electroniques etant susceptibles d'alteration,
>France Telecom - Orange decline toute responsabilite si ce message a ete
>altere, deforme ou falsifie. Merci.
>
>This message and its attachments may contain confidential or privileged
>information that may be protected by law;
>they should not be distributed, used or copied without authorisation.
>If you have received this email in error, please notify the sender and
>delete this message and its attachments.
>As emails may be altered, France Telecom - Orange is not liable for
>messages that have been modified, changed or falsified.
>Thank you.
>


Reply via email to