To specify your cassandra cluster, you only need to define one node: In you profile or batch command set and export these variables:
export PIG_HOME=<PATH TO PIG INSTALL> export PIG_INITIAL_ADDRESS=localhost export PIG_RPC_PORT=9160 # the partitioner must match your cassandra partitioner export PIG_PARTITIONER=org.apache.cassandra.dht.Murmur3Partitioner http://www.schappet.com/pig_cassandra_bulk_load/ —Jimmy On May 30, 2014, at 11:50 AM, Alex McLintock <a...@owal.co.uk> wrote: > I am reasonably experienced with Hadoop and Pig but less so with Cassandra. I > have been banging my head against the wall as all the documentation assumes I > know something... > > I am using Apache's tarball of Cassandra 1.something and I see that there are > some example pig scripts and a shell script to run them with the cassandra > jars. > > What I don't understand is how you tell the pig script which machine the > cassandra cluster talks to. You only specify the keyspace right - which > roughly corresponds to the database/table, but not which cluster. > > Can you tell what I have missed? Does the hadoop nodes HAVE to be on the same > machines as the Cassandra nodes? > > I am using CQL storage I think. > > eg > > > -- CqlStorage > libdata = LOAD 'cql://libdata/libout' USING CqlStorage(); > > book_by_mail = FILTER libdata BY C_OUT_TY == 'BM'; > > etc etc > > > > Thanks all... > > > >