Hello, I'm looking for some clues to help me fix an annoying error I'm getting using Pig.
I need to parse a large JSON file so I grabbed kimsterv's ( https://gist.github.com/601331) JSON loader, compiled it and successfully tested it on my laptop via -x local. However, when I try to run it on the edgenode of our dev hadoop instance I am unable to get it to work, even if I run it in -x local. I get "org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input splits for test.json". I looked through the mailing list for this message, only to find a mention of it being related to LZO compression issues. I'm not using any file compression and this error still occurs when running in -x local on the edgenode of the dev cluster. Is there some environment variables I'm missing? Maybe some permissions issues I'm unaware of? Suggestions and theories welcome! Hadoop version: Hadoop 0.20.2+737 Pig version: 0.7.0+16 (compiled against the pig 0.7.0 jar) Command line: java -cp '/usr/lib/pig/*:/usr/lib/hadoop/*:/usr/lib/hadoop/lib/*:libs/*:.' org.apache.pig.Main -v -x local json.pig Pig script: REGISTER /home/geoffeg/pig-functions/jsontester.jar; -- file:// should specify the local FS, remove file:// to specify HDFS A = LOAD 'file://home/geoffeg/test.json' using org.geoffeg.hadoop.pig.loader.PigJsonLoader() as ( json: map[] ); B = foreach A generate json#'_keyword'; DUMP B; Full error/log: 2011-01-09 22:33:29,692 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:/// 2011-01-09 22:33:30,345 [main] INFO org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned for A 2011-01-09 22:33:30,345 [main] INFO org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - Map key required for A: $0->[_keyword] 2011-01-09 22:33:30,455 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: Store(file:/tmp/temp1814319995/tmp1141533149:org.apache.pig.builtin.BinStorage) - 1-36 Operator Key: 1-36) 2011-01-09 22:33:30,482 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2011-01-09 22:33:30,482 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2011-01-09 22:33:30,517 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId= 2011-01-09 22:33:30,522 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2011-01-09 22:33:32,520 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job 2011-01-09 22:33:32,552 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-01-09 22:33:32,552 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 2011-01-09 22:33:32,562 [Thread-2] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 2011-01-09 22:33:32,692 [Thread-2] INFO org.apache.hadoop.mapred.JobClient - Cleaning up the staging area file:/tmp/hadoop-geoffeg/mapred/staging/geoffeg395595954/.staging/job_local_0001 2011-01-09 22:33:33,054 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2011-01-09 22:33:33,054 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2011-01-09 22:33:33,054 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map reduce job(s) failed! 2011-01-09 22:33:33,064 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed to produce result in: "file:/tmp/temp1814319995/tmp1141533149" 2011-01-09 22:33:33,064 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Records written : Unable to determine number of records written 2011-01-09 22:33:33,065 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Bytes written : Unable to determine number of bytes written 2011-01-09 22:33:33,065 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Spillable Memory Manager spill count : 0 2011-01-09 22:33:33,065 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Proactive spill count : 0 2011-01-09 22:33:33,065 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed! 2011-01-09 22:33:33,133 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2997: Unable to recreate exception from backend error: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input splits for: file://home/geoffeg/test.json 2011-01-09 22:33:33,134 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias B at org.apache.pig.PigServer.openIterator(PigServer.java:607) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:545) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:163) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:139) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) at org.apache.pig.Main.main(Main.java:414) Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2997: Unable to recreate exception from backend error: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input splits for: file://home/geoffeg/test.json at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:169) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:270) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:308) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1007) at org.apache.pig.PigServer.store(PigServer.java:697) at org.apache.pig.PigServer.openIterator(PigServer.java:590) ... 6 more -- Sent from my email client.
