Re: pig + hadoop

pob Wed, 20 Apr 2011 03:33:40 -0700

my false,

ignore last post.



2011/4/20 pob <peterob...@gmail.com>

> Hi,
>
> everything works fine with cassandra 0.7.5, but when I tried with 0.7.3
> another errors showed up, but task finished with success whats strange.....
>
>
> 2011-04-20 11:45:40,674 INFO org.apache.hadoop.mapred.TaskInProgress: Error
> from attempt_201104201139_0004_m_000000_3: Error: java.lang.ClassNotF
> oundException: org.apache.thrift.TException
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>         at java.lang.Class.forName0(Native Method)
>         at java.lang.Class.forName(Class.java:247)
>         at
> org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:426)
>         at
> org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:456)
>         at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getLoadFunc(PigInputFormat.java:153)
>         at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:105)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:588)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
>
>
> 2011-04-20 11:45:43,629 INFO org.apache.hadoop.mapred.TaskInProgress: Error
> from attempt_201104201139_0004_m_000001_3: org.apache.pig.backend.exe
> cutionengine.ExecException: ERROR 2044: The type null cannot be collected
> as a Key type
>         at
> org.apache.pig.backend.hadoop.HDataType.getWritableComparableTypes(HDataType.java:143)
>         at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:105)
>         at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:238)
>         at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
>         at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
>
> 2011-04-20 11:42:49,498 INFO org.apache.hadoop.mapred.TaskInProgress: Error
> from attempt_201104201139_0001_m_000000_1: Error: java.lang.ClassNotF
> oundException: org.apache.commons.lang.ArrayUtils
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>         at
> org.apache.cassandra.utils.ByteBufferUtil.<clinit>(ByteBufferUtil.java:75)
>         at
> org.apache.cassandra.hadoop.pig.CassandraStorage.<clinit>(Unknown Source)
>         at java.lang.Class.forName0(Native Method)
>         at java.lang.Class.forName(Class.java:247)
>         at
> org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:426)
>         at
> org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:456)
>         at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getLoadFunc(PigInputFormat.java:153)
>         at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:105)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:588)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
>
>
>
>
> 2011/4/20 Jeremy Hanna <jeremy.hanna1...@gmail.com>
>
>> Just as an example:
>>
>>  <property>
>>    <name>cassandra.thrift.address</name>
>>    <value>10.12.34.56</value>
>>  </property>
>>  <property>
>>    <name>cassandra.thrift.port</name>
>>    <value>9160</value>
>>  </property>
>>  <property>
>>    <name>cassandra.partitioner.class</name>
>>    <value>org.apache.cassandra.dht.RandomPartitioner</value>
>>  </property>
>>
>>
>> On Apr 19, 2011, at 10:28 PM, Jeremy Hanna wrote:
>>
>> > oh yeah - that's what's going on.  what I do is on the machine that I
>> run the pig script from, I set the PIG_CONF variable to my HADOOP_HOME/conf
>> directory and in my mapred-site.xml file found there, I set the three
>> variables.
>> >
>> > I don't use environment variables when I run against a cluster.
>> >
>> > On Apr 19, 2011, at 9:54 PM, Jeffrey Wang wrote:
>> >
>> >> Did you set PIG_RPC_PORT in your hadoop-env.sh? I was seeing this error
>> for a while before I added that.
>> >>
>> >> -Jeffrey
>> >>
>> >> From: pob [mailto:peterob...@gmail.com]
>> >> Sent: Tuesday, April 19, 2011 6:42 PM
>> >> To: user@cassandra.apache.org
>> >> Subject: Re: pig + hadoop
>> >>
>> >> Hey Aaron,
>> >>
>> >> I read it, and all of 3 env variables was exported. The results are
>> same.
>> >>
>> >> Best,
>> >> P
>> >>
>> >> 2011/4/20 aaron morton <aa...@thelastpickle.com>
>> >> Am guessing but here goes. Looks like the cassandra RPC port is not
>> set, did you follow these steps in contrib/pig/README.txt
>> >>
>> >> Finally, set the following as environment variables (uppercase,
>> >> underscored), or as Hadoop configuration variables (lowercase, dotted):
>> >> * PIG_RPC_PORT or cassandra.thrift.port : the port thrift is listening
>> on
>> >> * PIG_INITIAL_ADDRESS or cassandra.thrift.address : initial address to
>> connect to
>> >> * PIG_PARTITIONER or cassandra.partitioner.class : cluster partitioner
>> >>
>> >> Hope that helps.
>> >> Aaron
>> >>
>> >>
>> >> On 20 Apr 2011, at 11:28, pob wrote:
>> >>
>> >>
>> >> Hello,
>> >>
>> >> I did cluster configuration by
>> http://wiki.apache.org/cassandra/HadoopSupport. When I run pig
>> example-script.pig
>> >> -x local, everything is fine and i get correct results.
>> >>
>> >> Problem is occurring with -x mapreduce
>> >>
>> >> Im getting those errors :>
>> >>
>> >>
>> >> 2011-04-20 01:24:21,791 [main] ERROR
>> org.apache.pig.tools.pigstats.PigStats - ERROR:
>> java.lang.NumberFormatException: null
>> >> 2011-04-20 01:24:21,792 [main] ERROR
>> org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
>> >> 2011-04-20 01:24:21,793 [main] INFO
>>  org.apache.pig.tools.pigstats.PigStats - Script Statistics:
>> >>
>> >> Input(s):
>> >> Failed to read data from "cassandra://Keyspace1/Standard1"
>> >>
>> >> Output(s):
>> >> Failed to produce result in
>> "hdfs://ip:54310/tmp/temp-1383865669/tmp-1895601791"
>> >>
>> >> Counters:
>> >> Total records written : 0
>> >> Total bytes written : 0
>> >> Spillable Memory Manager spill count : 0
>> >> Total bags proactively spilled: 0
>> >> Total records proactively spilled: 0
>> >>
>> >> Job DAG:
>> >> job_201104200056_0005   ->      null,
>> >> null    ->      null,
>> >> null
>> >>
>> >>
>> >> 2011-04-20 01:24:21,793 [main] INFO
>>  
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> - Failed!
>> >> 2011-04-20 01:24:21,803 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>> ERROR 1066: Unable to open iterator for alias topnames. Backend error :
>> java.lang.NumberFormatException: null
>> >>
>> >>
>> >>
>> >> ====
>> >> thats from jobtasks web management - error  from task directly:
>> >>
>> >> java.lang.RuntimeException: java.lang.NumberFormatException: null
>> >> at
>> org.apache.cassandra.hadoop.ColumnFamilyRecordReader.initialize(ColumnFamilyRecordReader.java:123)
>> >> at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initialize(PigRecordReader.java:176)
>> >> at
>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:418)
>> >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:620)
>> >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>> >> at org.apache.hadoop.mapred.Child.main(Child.java:170)
>> >> Caused by: java.lang.NumberFormatException: null
>> >> at java.lang.Integer.parseInt(Integer.java:417)
>> >> at java.lang.Integer.parseInt(Integer.java:499)
>> >> at
>> org.apache.cassandra.hadoop.ConfigHelper.getRpcPort(ConfigHelper.java:233)
>> >> at
>> org.apache.cassandra.hadoop.ColumnFamilyRecordReader.initialize(ColumnFamilyRecordReader.java:105)
>> >> ... 5 more
>> >>
>> >>
>> >>
>> >> Any suggestions where should be problem?
>> >>
>> >> Thanks,
>> >>
>> >>
>> >>
>> >
>>
>>
>

Re: pig + hadoop

Reply via email to