Re: AWS EMR <-> Cassandra

William Oberman Fri, 04 Jan 2013 06:05:46 -0800

On all tasktrackers, I see:
java.io.IOException: PIG_OUTPUT_INITIAL_ADDRESS or PIG_INITIAL_ADDRESS
environment variable not set
        at
org.apache.cassandra.hadoop.pig.CassandraStorage.setStoreLocation(CassandraStorage.java:821)
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.setLocation(PigOutputFormat.java:170)
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.setUpContext(PigOutputCommitter.java:112)
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.getCommitters(PigOutputCommitter.java:86)
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.<init>(PigOutputCommitter.java:67)
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getOutputCommitter(PigOutputFormat.java:279)
        at org.apache.hadoop.mapred.Task.initialize(Task.java:515)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:358)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)



On Thu, Jan 3, 2013 at 10:45 PM, aaron morton <aa...@thelastpickle.com>wrote:

> Instead, I get an error from CassandraStorage that the initial address
> isn't set (on the slave, the master is ok).
>
> Can you post the full error ?
>
> Cheers
>    -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 4/01/2013, at 11:15 AM, William Oberman <ober...@civicscience.com>
> wrote:
>
> Anyone ever try to read or write directly between EMR <-> Cassandra?
>
> I'm running various Cassandra resources in Ec2, so the "physical
> connection" part is pretty easy using security groups.  But, I'm having
> some configuration issues.  I have managed to get Cassandra + Hadoop
> working in the past using a DIY hadoop cluster, and looking at the
> configurations in the two environments (EMR vs DIY), I'm not sure what's
> different that is causing my failures...  I should probably note I'm using
> the Pig integration of Cassandra.
>
> Versions: Hadoop 1.0.3, Pig 0.10, Cassandra 1.1.7.
>
> I'm 99% sure I have classpaths working (because I didn't at first, and now
> EMR can find and instantiate CassandraStorage on master and slaves).  What
> isn't working are the system variables.  In my DIY cluster, all I needed to
> do was:
> -------
> export PIG_INITIAL_ADDRESS=XXX
> export PIG_RPC_PORT=9160
> export PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner
> ----------
> And the task trackers somehow magically picked up the values (I never
> questioned how/why).  But, in EMR, they do not.  Instead, I get an error
> from CassandraStorage that the initial address isn't set (on the slave, the
> master is ok).
>
> My DIY cluster used CDH3, which was hadoop 0.20.something.  So, maybe the
> problem is a different version of hadoop?
>
> Looking at the CassandraStorage class, I realize I have no idea how it
> used to work, since it only seems to look at System variables.  Those
> variables are set on the Job.getConfiguration object.  I don't know how
> that part of hadoop works though... do variables that get set on Job on the
> master get propagated to the task threads?  I do know that on my DIY
> cluster, I do NOT set those system variables on the slaves...
>
> Thanks!
>
> will
>
>
>

Re: AWS EMR <-> Cassandra

Reply via email to