Re: Pig / Map Reduce on Cassandra

cscetbon.ext Wed, 16 Jan 2013 11:17:36 -0800

Thanks I understand that your code uses the hadoop interface of Cassandra to be 
able to read from it with a job. However I would like to know how to bring 
pieces (hive + pig + hadoop) together with cassandra as the storage layer, not 
to get code to test it. I have found repository 
https://github.com/riptano/brisk which might be a good start for it


Regards 

On Jan 16, 2013, at 4:27 PM, James Schappet <jschap...@gmail.com> wrote:

> Try this one then, it reads from cassandra, then writes back to cassandra,
> but you could change the write to where ever you would like.
> 
> 
> 
>   getConf().set(IN_COLUMN_NAME, columnName );
> 
>                       Job job = new Job(getConf(), "ProcessRawXml");
>            job.setInputFormatClass(ColumnFamilyInputFormat.class);
>                   job.setNumReduceTasks(0);
> 
>            job.setJarByClass(StartJob.class);
>            job.setMapperClass(ParseMapper.class);
>            job.setOutputKeyClass(ByteBuffer.class);
>            //job.setOutputValueClass(Text.class);
>               job.setOutputFormatClass(ColumnFamilyOutputFormat.class);
> 
>            ConfigHelper.setOutputColumnFamily(job.getConfiguration(),
> KEYSPACE, COLUMN_FAMILY);
>            job.setInputFormatClass(ColumnFamilyInputFormat.class);
>            ConfigHelper.setRpcPort(job.getConfiguration(), "9160");
>            //org.apache.cassandra.dht.LocalPartitioner
>               ConfigHelper.setInitialAddress(job.getConfiguration(),
> "localhost");
>               ConfigHelper.setPartitioner(job.getConfiguration(),
> "org.apache.cassandra.dht.RandomPartitioner");
>               ConfigHelper.setInputColumnFamily(job.getConfiguration(),
> KEYSPACE, COLUMN_FAMILY);
> 
> 
>               SlicePredicate predicate = new
> SlicePredicate().setColumn_names(Arrays.asList(ByteBufferUtil.bytes(columnN
> ame)));
> //            SliceRange slice_range = new SliceRange();
> //            slice_range.setStart(ByteBufferUtil.bytes(startPoint));
> //            slice_range.setFinish(ByteBufferUtil.bytes(endPoint));
> //            
> //            predicate.setSlice_range(slice_range);
>               ConfigHelper.setInputSlicePredicate(job.getConfiguration(),
> predicate);
> 
>               job.waitForCompletion(true);
> 
> 
> https://github.com/jschappet/medline/blob/master/src/main/java/edu/uiowa/ic
> ts/jobs/ProcessXml/StartJob.java
> 
> 
> 
> 
> 
> 
> 
> 
> On 1/16/13 9:22 AM, "cscetbon....@orange.com" <cscetbon....@orange.com>
> wrote:
> 
>> I don't want to write to Cassandra as it replicates data from another
>> datacenter, but I just want to use Hadoop Jobs (Pig and Hive) to read
>> data from it. I would like to use the same configuration as
>> http://www.datastax.com/dev/blog/hadoop-mapreduce-in-the-cassandra-cluster
>> but I want to know if there are alternatives to DataStax Enterprise
>> package.
>> 
>> Thanks
>> On Jan 16, 2013, at 3:59 PM, James Schappet <jschap...@gmail.com> wrote:
>> 
>>> Here are a few examples I have worked on, reading from xml.gz files then
>>> writing to cassandara.
>>> 
>>> 
>>> https://github.com/jschappet/medline
>>> 
>>> You will also need:
>>> 
>>> https://github.com/jschappet/medline-base
>>> 
>>> 
>>> 
>>> These examples are Hadoop Jobs using Cassandra as the Data Store.
>>> 
>>> This one is a good place to start.
>>> 
>>> https://github.com/jschappet/medline/blob/master/src/main/java/edu/uiowa/
>>> ic
>>> ts/jobs/LoadMedline/StartJob.java
>>> 
>>> ConfigHelper.setInputColumnFamily(job.getConfiguration(), KEYSPACE,
>>> COLUMN_FAMILY);
>>>     ConfigHelper.setOutputColumnFamily(job.getConfiguration(), KEYSPACE,
>>> outputPath);
>>> 
>>>       job.setMapperClass(MapperToCassandra.class);
>>>       job.setOutputKeyClass(Text.class);
>>>       job.setOutputValueClass(Text.class);
>>> 
>>>     LOG.info("Writing output to Cassandra");
>>>     //job.setReducerClass(ReducerToCassandra.class);
>>>     job.setOutputFormatClass(ColumnFamilyOutputFormat.class);
>>> 
>>>       ConfigHelper.setRpcPort(job.getConfiguration(), "9160");
>>>       //org.apache.cassandra.dht.LocalPartitioner
>>>       ConfigHelper.setInitialAddress(job.getConfiguration(),
>>> "localhost");
>>>       ConfigHelper.setPartitioner(job.getConfiguration(),
>>> "org.apache.cassandra.dht.RandomPartitioner");
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On 1/16/13 7:37 AM, "cscetbon....@orange.com" <cscetbon....@orange.com>
>>> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I know that DataStax Enterprise package provide Brisk, but is there a
>>>> community version ? Is it easy to interface Hadoop with Cassandra as
>>>> the
>>>> storage or do we absolutely have to use Brisk for that ?
>>>> I know CassandraFS is natively available in cassandra 1.2, the version
>>>> I
>>>> use, so is there a way/procedure to interface hadoop with Cassandra as
>>>> the storage ?
>>>> 
>>>> Thanks 
>>>> 
>>>> ________________________________________________________________________
>>>> __
>>>> _______________________________________________
>>>> 
>>>> Ce message et ses pieces jointes peuvent contenir des informations
>>>> confidentielles ou privilegiees et ne doivent donc
>>>> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez
>>>> recu ce message par erreur, veuillez le signaler
>>>> a l'expediteur et le detruire ainsi que les pieces jointes. Les
>>>> messages
>>>> electroniques etant susceptibles d'alteration,
>>>> France Telecom - Orange decline toute responsabilite si ce message a
>>>> ete
>>>> altere, deforme ou falsifie. Merci.
>>>> 
>>>> This message and its attachments may contain confidential or privileged
>>>> information that may be protected by law;
>>>> they should not be distributed, used or copied without authorisation.
>>>> If you have received this email in error, please notify the sender and
>>>> delete this message and its attachments.
>>>> As emails may be altered, France Telecom - Orange is not liable for
>>>> messages that have been modified, changed or falsified.
>>>> Thank you.
>>>> 
>>> 
>>> 
>> 
>> 
>> __________________________________________________________________________
>> _______________________________________________
>> 
>> Ce message et ses pieces jointes peuvent contenir des informations
>> confidentielles ou privilegiees et ne doivent donc
>> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez
>> recu ce message par erreur, veuillez le signaler
>> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
>> electroniques etant susceptibles d'alteration,
>> France Telecom - Orange decline toute responsabilite si ce message a ete
>> altere, deforme ou falsifie. Merci.
>> 
>> This message and its attachments may contain confidential or privileged
>> information that may be protected by law;
>> they should not be distributed, used or copied without authorisation.
>> If you have received this email in error, please notify the sender and
>> delete this message and its attachments.
>> As emails may be altered, France Telecom - Orange is not liable for
>> messages that have been modified, changed or falsified.
>> Thank you.
>> 
> 
> 


_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
France Telecom - Orange decline toute responsabilite si ce message a ete 
altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, France Telecom - Orange is not liable for messages 
that have been modified, changed or falsified.
Thank you.

Re: Pig / Map Reduce on Cassandra

Reply via email to