Nope, still fails. I've used Pig for over 5 years. If this is fucking me
up, it must be a total nightmare for our average user. Pig doesn't work out
of the box.
grunt> data = LOAD '/Users/rjurney/Software/foo/data/gsa_feed.xml' USING
org.apache.pig.piggybank.storage.XMLLoader('record') AS (doc:chararray);
grunt> dump data
2015-01-21 21:51:20,159 [main] INFO
org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
script: UNKNOWN
2015-01-21 21:51:20,170 [main] INFO org.apache.pig.data.SchemaTupleBackend
- Key [pig.schematuple] was not set... will not generate code.
2015-01-21 21:51:20,171 [main] INFO
org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer -
{RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator,
GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter,
MergeFilter, MergeForEach, PartitionFilterOptimizer,
PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter,
SplitFilter, StreamTypeCastInserter]}
2015-01-21 21:51:20,178 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler -
File concatenation threshold: 100 optimistic? false
2015-01-21 21:51:20,180 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 1
2015-01-21 21:51:20,180 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 1
2015-01-21 21:51:20,193 [main] INFO
org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings
are added to the job
2015-01-21 21:51:20,193 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2015-01-21 21:51:20,201 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up single store job
2015-01-21 21:51:20,202 [main] INFO
org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false,
will not generate code.
2015-01-21 21:51:20,202 [main] INFO
org.apache.pig.data.SchemaTupleFrontend - Starting process to move
generated code to distributed cacche
2015-01-21 21:51:20,202 [main] INFO
org.apache.pig.data.SchemaTupleFrontend - Distributed cache not supported
or needed in local mode. Setting key [pig.schematuple.local.dir] with code
temp directory:
/var/folders/0b/74l_65015_5fcbmbdz1w2xl40000gn/T/1421905880202-0
2015-01-21 21:51:20,212 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map-reduce job(s) waiting for submission.
2015-01-21 21:51:20,215 [JobControl] WARN
org.apache.hadoop.mapred.JobClient - No job jar file set. User classes may
not be found. See JobConf(Class) or JobConf#setJar(String).
2015-01-21 21:51:20,238 [JobControl] INFO
org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths
to process : 1
2015-01-21 21:51:20,238 [JobControl] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
paths to process : 1
2015-01-21 21:51:20,238 [JobControl] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
paths (combined) to process : 6
2015-01-21 21:51:20,320 [Thread-6] INFO org.apache.hadoop.mapred.Task -
Using ResourceCalculatorPlugin : null
2015-01-21 21:51:20,330 [Thread-6] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader
- Current split being processed
file:/Users/rjurney/Software/foo/data/gsa_feed.xml:0+33554432
2015-01-21 21:51:20,343 [Thread-6] WARN
org.apache.hadoop.mapred.LocalJobRunner - job_local_0002
java.lang.IncompatibleClassChangeError: Found class
org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected
at
org.apache.pig.piggybank.storage.XMLLoader$XMLRecordReader.initialize(XMLLoader.java:102)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initialize(PigRecordReader.java:181)
at
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:522)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
2015-01-21 21:51:20,713 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- HadoopJobId: job_local_0002
2015-01-21 21:51:20,713 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Processing aliases data
2015-01-21 21:51:20,713 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- detailed locations: M: data[2,7],data[-1,-1] C: R:
2015-01-21 21:51:20,715 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete
2015-01-21 21:51:20,717 [main] WARN
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to
stop immediately on failure.
2015-01-21 21:51:20,717 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- job job_local_0002 has failed! Stop running all dependent jobs
2015-01-21 21:51:20,717 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete
2015-01-21 21:51:20,717 [main] ERROR
org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce
job(s) failed!
2015-01-21 21:51:20,717 [main] INFO
org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
1.0.4 0.14.0 rjurney 2015-01-21 21:51:20 2015-01-21 21:51:20 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_local_0002 data MAP_ONLY Message: Job failed! Error - NA
file:/tmp/temp-476431088/tmp-2144425957,
Input(s):
Failed to read data from "/Users/rjurney/Software/foo/data/gsa_feed.xml"
Output(s):
Failed to produce result in "file:/tmp/temp-476431088/tmp-2144425957"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_local_0002
2015-01-21 21:51:20,717 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Failed!
2015-01-21 21:51:20,718 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1066: Unable to open iterator for alias data
2015-01-21 21:51:20,718 [main] ERROR org.apache.pig.tools.grunt.Grunt -
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to
open iterator for alias data
at org.apache.pig.PigServer.openIterator(PigServer.java:935)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:746)
at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66)
at org.apache.pig.Main.run(Main.java:558)
at org.apache.pig.Main.main(Main.java:170)
Caused by: java.io.IOException: Job terminated with anomalous status FAILED
at org.apache.pig.PigServer.openIterator(PigServer.java:927)
... 7 more
Details also at logfile: /private/tmp/pig_1421905809654.log
ᐧ
On Wed, Jan 21, 2015 at 9:47 PM, Russell Jurney <[email protected]>
wrote:
> To answer my own question, a typical YARN disaster.
> http://stackoverflow.com/questions/25236766/pig-error-unhandled-internal-error-found-interface-org-apache-hadoop-mapreduc
> ᐧ
>
> On Wed, Jan 21, 2015 at 9:44 PM, Russell Jurney <[email protected]>
> wrote:
>
>> Wait... keep in mind there is no Hadoop on my system. How can I be
>> getting Hadoop 1/2 issues? I can't use XMLLoader...
>>
>> grunt> REGISTER
>> /Users/rjurney/Software/pig-0.14.0/contrib/piggybank/java/piggybank.jar
>>
>> grunt>
>>
>> grunt> data = LOAD '/Users/rjurney/Software/foo/data/gsa_feed.xml' USING
>> org.apache.pig.piggybank.storage.XMLLoader('record') AS (doc:chararray);
>>
>> grunt> a = limit data 10;
>>
>> grunt> dump a
>>
>> 2015-01-21 21:42:52,113 [main] INFO
>> org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
>> script: LIMIT
>>
>> 2015-01-21 21:42:52,125 [main] WARN
>> org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already
>> been initialized
>>
>> 2015-01-21 21:42:52,126 [main] INFO
>> org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer -
>> {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator,
>> GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter,
>> MergeFilter, MergeForEach, PartitionFilterOptimizer,
>> PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter,
>> SplitFilter, StreamTypeCastInserter]}
>>
>> 2015-01-21 21:42:52,200 [main] INFO
>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths
>> to process : 1
>>
>> 2015-01-21 21:42:52,200 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
>> paths to process : 1
>>
>> 2015-01-21 21:42:52,204 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>> ERROR 2998: Unhandled internal error. Found class
>> org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected
>>
>> 2015-01-21 21:42:52,204 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>> java.lang.IncompatibleClassChangeError: Found class
>> org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected
>>
>> at
>> org.apache.pig.piggybank.storage.XMLLoader$XMLRecordReader.initialize(XMLLoader.java:102)
>>
>> at
>> org.apache.pig.impl.io.ReadToEndLoader.initializeReader(ReadToEndLoader.java:210)
>>
>> at
>> org.apache.pig.impl.io.ReadToEndLoader.getNextHelper(ReadToEndLoader.java:248)
>>
>> at
>> org.apache.pig.impl.io.ReadToEndLoader.getNext(ReadToEndLoader.java:229)
>>
>> at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.getNextTuple(POLoad.java:137)
>>
>> at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307)
>>
>> at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLimit.getNextTuple(POLimit.java:122)
>>
>> at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307)
>>
>> at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:246)
>>
>> at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307)
>>
>> at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNextTuple(POStore.java:159)
>>
>> at
>> org.apache.pig.backend.hadoop.executionengine.fetch.FetchLauncher.runPipeline(FetchLauncher.java:161)
>>
>> at
>> org.apache.pig.backend.hadoop.executionengine.fetch.FetchLauncher.launchPig(FetchLauncher.java:81)
>>
>> at
>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:278)
>>
>> at org.apache.pig.PigServer.launchPlan(PigServer.java:1390)
>>
>> at
>> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1375)
>>
>> at org.apache.pig.PigServer.storeEx(PigServer.java:1034)
>>
>> at org.apache.pig.PigServer.store(PigServer.java:997)
>>
>> at org.apache.pig.PigServer.openIterator(PigServer.java:910)
>>
>> at
>> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:746)
>>
>> at
>> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372)
>>
>> at
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230)
>>
>> at
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)
>>
>> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66)
>>
>> at org.apache.pig.Main.run(Main.java:558)
>>
>> at org.apache.pig.Main.main(Main.java:170)
>>
>>
>> Details also at logfile: /private/tmp/pig_1421905319118.log
>>
>>
>> ᐧ
>>
>> On Wed, Jan 21, 2015 at 9:42 PM, Russell Jurney <[email protected]
>> > wrote:
>>
>>> Not sure what was going on, but I got it working.
>>> ᐧ
>>>
>>> On Wed, Jan 21, 2015 at 9:39 PM, Russell Jurney <
>>> [email protected]> wrote:
>>>
>>>> I am working on a macbook without Hadoop installed. I download Pig
>>>> 0.14.0, and I run it...
>>>>
>>>> Russells-MacBook-Pro:pig-0.14.0 rjurney$ bin/pig -l /tmp -v -w -x local
>>>>
>>>>
>>>> I run the following commands and get this exception. What gives? Why
>>>> doesn't pig work?
>>>>
>>>> grunt> foo = LOAD '/etc/passwd' USING TextLoader();
>>>>
>>>> 2015-01-21 21:36:24,095 [main] INFO
>>>> org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is
>>>> deprecated. Instead, use fs.defaultFS
>>>>
>>>> 2015-01-21 21:36:24,096 [main] INFO
>>>> org.apache.hadoop.conf.Configuration.deprecation - io.bytes.per.checksum is
>>>> deprecated. Instead, use dfs.bytes-per-checksum
>>>>
>>>> grunt> dump foo
>>>>
>>>> 2015-01-21 21:36:25,701 [main] INFO
>>>> org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
>>>> script: UNKNOWN
>>>>
>>>> 2015-01-21 21:36:25,735 [main] INFO
>>>> org.apache.hadoop.conf.Configuration.deprecation - io.bytes.per.checksum is
>>>> deprecated. Instead, use dfs.bytes-per-checksum
>>>>
>>>> 2015-01-21 21:36:25,737 [main] INFO
>>>> org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is
>>>> deprecated. Instead, use fs.defaultFS
>>>>
>>>> 2015-01-21 21:36:25,738 [main] INFO
>>>> org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer -
>>>> {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator,
>>>> GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter,
>>>> MergeFilter, MergeForEach, PartitionFilterOptimizer,
>>>> PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter,
>>>> SplitFilter, StreamTypeCastInserter]}
>>>>
>>>> 2015-01-21 21:36:25,740 [main] INFO
>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler -
>>>> File concatenation threshold: 100 optimistic? false
>>>>
>>>> 2015-01-21 21:36:25,743 [main] INFO
>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>>>> - MR plan size before optimization: 1
>>>>
>>>> 2015-01-21 21:36:25,743 [main] INFO
>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>>>> - MR plan size after optimization: 1
>>>>
>>>> 2015-01-21 21:36:25,776 [main] INFO
>>>> org.apache.hadoop.conf.Configuration.deprecation - io.bytes.per.checksum is
>>>> deprecated. Instead, use dfs.bytes-per-checksum
>>>>
>>>> 2015-01-21 21:36:25,777 [main] INFO
>>>> org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is
>>>> deprecated. Instead, use fs.defaultFS
>>>>
>>>> 2015-01-21 21:36:25,779 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>>>> ERROR 2998: Unhandled internal error. Bad type on operand stack
>>>>
>>>> Exception Details:
>>>>
>>>> Location:
>>>>
>>>>
>>>> org/apache/hadoop/mapred/JobTrackerInstrumentation.create(Lorg/apache/hadoop/mapred/JobTracker;Lorg/apache/hadoop/mapred/JobConf;)Lorg/apache/hadoop/mapred/JobTrackerInstrumentation;
>>>> @5: invokestatic
>>>>
>>>> Reason:
>>>>
>>>> Type 'org/apache/hadoop/metrics2/lib/DefaultMetricsSystem' (current
>>>> frame, stack[2]) is not assignable to
>>>> 'org/apache/hadoop/metrics2/MetricsSystem'
>>>>
>>>> Current Frame:
>>>>
>>>> bci: @5
>>>>
>>>> flags: { }
>>>>
>>>> locals: { 'org/apache/hadoop/mapred/JobTracker',
>>>> 'org/apache/hadoop/mapred/JobConf' }
>>>>
>>>> stack: { 'org/apache/hadoop/mapred/JobTracker',
>>>> 'org/apache/hadoop/mapred/JobConf',
>>>> 'org/apache/hadoop/metrics2/lib/DefaultMetricsSystem' }
>>>>
>>>> Bytecode:
>>>>
>>>> 0000000: 2a2b b200 03b8 0004 b0
>>>>
>>>>
>>>> 2015-01-21 21:36:25,780 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>>>> java.lang.VerifyError: Bad type on operand stack
>>>>
>>>> Exception Details:
>>>>
>>>> Location:
>>>>
>>>>
>>>> org/apache/hadoop/mapred/JobTrackerInstrumentation.create(Lorg/apache/hadoop/mapred/JobTracker;Lorg/apache/hadoop/mapred/JobConf;)Lorg/apache/hadoop/mapred/JobTrackerInstrumentation;
>>>> @5: invokestatic
>>>>
>>>> Reason:
>>>>
>>>> Type 'org/apache/hadoop/metrics2/lib/DefaultMetricsSystem' (current
>>>> frame, stack[2]) is not assignable to
>>>> 'org/apache/hadoop/metrics2/MetricsSystem'
>>>>
>>>> Current Frame:
>>>>
>>>> bci: @5
>>>>
>>>> flags: { }
>>>>
>>>> locals: { 'org/apache/hadoop/mapred/JobTracker',
>>>> 'org/apache/hadoop/mapred/JobConf' }
>>>>
>>>> stack: { 'org/apache/hadoop/mapred/JobTracker',
>>>> 'org/apache/hadoop/mapred/JobConf',
>>>> 'org/apache/hadoop/metrics2/lib/DefaultMetricsSystem' }
>>>>
>>>> Bytecode:
>>>>
>>>> 0000000: 2a2b b200 03b8 0004 b0
>>>>
>>>>
>>>> at
>>>> org.apache.hadoop.mapred.LocalJobRunner.<init>(LocalJobRunner.java:420)
>>>>
>>>> at org.apache.hadoop.mapred.JobClient.init(JobClient.java:472)
>>>>
>>>> at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:457)
>>>>
>>>> at
>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:163)
>>>>
>>>> at
>>>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:280)
>>>>
>>>> at org.apache.pig.PigServer.launchPlan(PigServer.java:1390)
>>>>
>>>> at
>>>> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1375)
>>>>
>>>> at org.apache.pig.PigServer.storeEx(PigServer.java:1034)
>>>>
>>>> at org.apache.pig.PigServer.store(PigServer.java:997)
>>>>
>>>> at org.apache.pig.PigServer.openIterator(PigServer.java:910)
>>>>
>>>> at
>>>> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:746)
>>>>
>>>> at
>>>> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372)
>>>>
>>>> at
>>>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230)
>>>>
>>>> at
>>>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)
>>>>
>>>> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66)
>>>>
>>>> at org.apache.pig.Main.run(Main.java:558)
>>>>
>>>> at org.apache.pig.Main.main(Main.java:170)
>>>>
>>>>
>>>> Details also at logfile: /private/tmp/pig_1421904926350.log
>>>>
>>>> --
>>>> Russell Jurney twitter.com/rjurney [email protected]
>>>> datasyndrome.com
>>>> ᐧ
>>>>
>>>
>>>
>>>
>>> --
>>> Russell Jurney twitter.com/rjurney [email protected] datasyndrome
>>> .com
>>>
>>
>>
>>
>> --
>> Russell Jurney twitter.com/rjurney [email protected] datasyndrome.
>> com
>>
>
>
>
> --
> Russell Jurney twitter.com/rjurney [email protected] datasyndrome.
> com
>
--
Russell Jurney twitter.com/rjurney [email protected] datasyndrome.com