On all tasktrackers, I see: java.io.IOException: PIG_OUTPUT_INITIAL_ADDRESS or PIG_INITIAL_ADDRESS environment variable not set at org.apache.cassandra.hadoop.pig.CassandraStorage.setStoreLocation(CassandraStorage.java:821) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.setLocation(PigOutputFormat.java:170) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.setUpContext(PigOutputCommitter.java:112) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.getCommitters(PigOutputCommitter.java:86) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.<init>(PigOutputCommitter.java:67) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getOutputCommitter(PigOutputFormat.java:279) at org.apache.hadoop.mapred.Task.initialize(Task.java:515) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:358) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132) at org.apache.hadoop.mapred.Child.main(Child.java:249)
On Thu, Jan 3, 2013 at 10:45 PM, aaron morton <aa...@thelastpickle.com>wrote: > Instead, I get an error from CassandraStorage that the initial address > isn't set (on the slave, the master is ok). > > Can you post the full error ? > > Cheers > ----------------- > Aaron Morton > Freelance Cassandra Developer > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 4/01/2013, at 11:15 AM, William Oberman <ober...@civicscience.com> > wrote: > > Anyone ever try to read or write directly between EMR <-> Cassandra? > > I'm running various Cassandra resources in Ec2, so the "physical > connection" part is pretty easy using security groups. But, I'm having > some configuration issues. I have managed to get Cassandra + Hadoop > working in the past using a DIY hadoop cluster, and looking at the > configurations in the two environments (EMR vs DIY), I'm not sure what's > different that is causing my failures... I should probably note I'm using > the Pig integration of Cassandra. > > Versions: Hadoop 1.0.3, Pig 0.10, Cassandra 1.1.7. > > I'm 99% sure I have classpaths working (because I didn't at first, and now > EMR can find and instantiate CassandraStorage on master and slaves). What > isn't working are the system variables. In my DIY cluster, all I needed to > do was: > ------- > export PIG_INITIAL_ADDRESS=XXX > export PIG_RPC_PORT=9160 > export PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner > ---------- > And the task trackers somehow magically picked up the values (I never > questioned how/why). But, in EMR, they do not. Instead, I get an error > from CassandraStorage that the initial address isn't set (on the slave, the > master is ok). > > My DIY cluster used CDH3, which was hadoop 0.20.something. So, maybe the > problem is a different version of hadoop? > > Looking at the CassandraStorage class, I realize I have no idea how it > used to work, since it only seems to look at System variables. Those > variables are set on the Job.getConfiguration object. I don't know how > that part of hadoop works though... do variables that get set on Job on the > master get propagated to the task threads? I do know that on my DIY > cluster, I do NOT set those system variables on the slaves... > > Thanks! > > will > > >