Re: Reading Cassandra Data From Pig/Hadoop

James Schappet Fri, 30 May 2014 10:11:36 -0700

To specify your cassandra cluster, you only need to define one node:

In you profile or batch command set and export these variables:


export PIG_HOME=<PATH TO PIG INSTALL>

export PIG_INITIAL_ADDRESS=localhost

export PIG_RPC_PORT=9160

# the partitioner must match your cassandra partitioner

export PIG_PARTITIONER=org.apache.cassandra.dht.Murmur3Partitioner




http://www.schappet.com/pig_cassandra_bulk_load/

—Jimmy 



On May 30, 2014, at 11:50 AM, Alex McLintock <a...@owal.co.uk> wrote:

> I am reasonably experienced with Hadoop and Pig but less so with Cassandra. I 
> have been banging my head against the wall as all the documentation assumes I 
> know something...
> 
> I am using Apache's tarball of Cassandra 1.something and I see that there are 
> some example pig scripts and a shell script to run them with the cassandra 
> jars. 
> 
> What I don't understand is how you tell the pig script which machine the 
> cassandra cluster talks to. You only specify the keyspace right - which 
> roughly corresponds to the database/table, but not which cluster. 
> 
> Can you tell what I have missed? Does the hadoop nodes HAVE to be on the same 
> machines as the Cassandra nodes?
> 
> I am using CQL storage I think.
> 
> eg
> 
> 
> -- CqlStorage
> libdata = LOAD 'cql://libdata/libout' USING CqlStorage();
> 
> book_by_mail = FILTER libdata BY C_OUT_TY == 'BM';
> 
> etc etc
> 
> 
> 
> Thanks all...
> 
> 
> 
>

Re: Reading Cassandra Data From Pig/Hadoop

Reply via email to