Hi, I am using Pig to analyze the percentage of each UserAgents from an
apache log. The following program failed because of ORDER command at the
very last (the result variable is correct and can be dumped out correctly).
I am relative new to Pig and could not figure it out so need you guys to
help. Following is the program and error message. Thanks!
logs = LOAD '$LOGS' USING ApacheCombinedLogLoader AS (remoteHost, hyphen,
user, time, method, uri, protocol, statusCode, responseSize, referer,
userAgent);
uarows = FOREACH logs GENERATE userAgent;
total = FOREACH (GROUP uarows ALL) GENERATE COUNT(uarows) as count;
dump total;
gpuarows = GROUP uarows BY userAgent;
result = FOREACH gpuarows {
subtotal = COUNT(uarows);
GENERATE flatten(group) as ua, subtotal AS SUB_TOTAL,
100*(double)subtotal/(double)total.count AS percentage;
};
orderresult = ORDER result BY SUB_TOTAL DESC;
dump orderresult;
-- what's weird is that 'dump result' works just fine, so it's the ORDER
line makes trouble
Errors:
2013-04-13 10:36:32,409 [Thread-48] INFO org.apache.hadoop.mapred.MapTask
- record buffer = 262144/327680
2013-04-13 10:36:32,437 [Thread-48] WARN
org.apache.hadoop.mapred.LocalJobRunner - job_local_0005
java.lang.RuntimeException:
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
does not exist:
file:/home/dliu/ApacheLogAnalysisWithPig/pigsample_259943398_1365820592017
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:157)
at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at
org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:677)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException:
Input path does not exist:
file:/home/dliu/ApacheLogAnalysisWithPig/pigsample_259943398_1365820592017
at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37)
at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:177)
at
org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:124)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:131)
... 6 more
2013-04-13 10:36:32,525 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- HadoopJobId: job_local_0005
2013-04-13 10:36:32,526 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Processing aliases orderresult
2013-04-13 10:36:32,526 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- detailed locations: M: orderresult[19,14] C: R:
2013-04-13 10:36:37,536 [main] WARN
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to
stop immediately on failure.
2013-04-13 10:36:37,536 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- job job_local_0005 has failed! Stop running all dependent jobs
2013-04-13 10:36:37,536 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete
2013-04-13 10:36:37,537 [main] ERROR
org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2013-04-13 10:36:37,538 [main] INFO
org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
1.0.4 0.11.0 dliu 2013-04-13 10:35:50 2013-04-13 10:36:37
GROUP_BY,ORDER_BY
Some jobs have failed! Stop running all dependent jobs
Job Stats (time in seconds):
JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime
MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime
MedianReducetime Alias Feature Outputs
job_local_0002 1 1 n/a n/a n/a n/a n/a n/a
1-18,logs,total,uarows MULTI_QUERY,COMBINER
job_local_0003 1 1 n/a n/a n/a n/a n/a n/a
gpuarows,result GROUP_BY,COMBINER
job_local_0004 1 1 n/a n/a n/a n/a n/a n/a
orderresult SAMPLER
Failed Jobs:
JobId Alias Feature Message Outputs
job_local_0005 orderresult ORDER_BY Message: Job failed! Error -
NA file:/tmp/temp-1225021115/tmp-62411972,
Input(s):
Successfully read 0 records from:
"file:///home/dliu/ApacheLogAnalysisWithPig/access.log"
Output(s):
Failed to produce result in "file:/tmp/temp-1225021115/tmp-62411972"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_local_0002 -> job_local_0003,
job_local_0003 -> job_local_0004,
job_local_0004 -> job_local_0005,
job_local_0005
2013-04-13 10:36:37,539 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Some jobs have failed! Stop running all dependent jobs
2013-04-13 10:36:37,541 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1066: Unable to open iterator for alias orderresult
Details at logfile:
/home/dliu/ApacheLogAnalysisWithPig/pig_1365820535568.log