I am reasonably experienced with Hadoop and Pig but less so with Cassandra. I have been banging my head against the wall as all the documentation assumes I know something...
I am using Apache's tarball of Cassandra 1.something and I see that there are some example pig scripts and a shell script to run them with the cassandra jars. What I don't understand is how you tell the pig script which machine the cassandra cluster talks to. You only specify the keyspace right - which roughly corresponds to the database/table, but not which cluster. Can you tell what I have missed? Does the hadoop nodes HAVE to be on the same machines as the Cassandra nodes? I am using CQL storage I think. eg -- CqlStorage libdata = LOAD 'cql://libdata/libout' USING CqlStorage(); book_by_mail = FILTER libdata BY C_OUT_TY == 'BM'; etc etc Thanks all...