Reading Cassandra Data From Pig/Hadoop

Alex McLintock Fri, 30 May 2014 09:51:07 -0700

I am reasonably experienced with Hadoop and Pig but less so with Cassandra.
I have been banging my head against the wall as all the documentation
assumes I know something...


I am using Apache's tarball of Cassandra 1.something and I see that there
are some example pig scripts and a shell script to run them with the
cassandra jars.

What I don't understand is how you tell the pig script which machine the
cassandra cluster talks to. You only specify the keyspace right - which
roughly corresponds to the database/table, but not which cluster.

Can you tell what I have missed? Does the hadoop nodes HAVE to be on the
same machines as the Cassandra nodes?

I am using CQL storage I think.

eg

-- CqlStorage
libdata = LOAD 'cql://libdata/libout' USING CqlStorage();
book_by_mail = FILTER libdata BY C_OUT_TY == 'BM';
etc etc


Thanks all...

Reading Cassandra Data From Pig/Hadoop

Reply via email to