I am sure it's not that. The ORDER command fails the whole thing. If I remove the ORDER command, the same script runs just fine except the result is not in order.
On Sat, Apr 13, 2013 at 4:54 PM, Prasanth J <[email protected]>wrote: > From the error logs, it seems like input file doesn't exist or not > accessible. > > > Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: > > Input path does not exist: > > > file:/home/dliu/ApacheLogAnalysisWithPig/pigsample_259943398_1365820592017 > > can you please check if the input path in $LOGS is proper? > > Thanks > -- Prasanth > > On Apr 12, 2013, at 11:02 PM, Lei Liu <[email protected]> wrote: > > > Hi, I am using Pig to analyze the percentage of each UserAgents from an > > apache log. The following program failed because of ORDER command at the > > very last (the result variable is correct and can be dumped out > correctly). > > I am relative new to Pig and could not figure it out so need you guys to > > help. Following is the program and error message. Thanks! > > > > logs = LOAD '$LOGS' USING ApacheCombinedLogLoader AS (remoteHost, hyphen, > > user, time, method, uri, protocol, statusCode, responseSize, referer, > > userAgent); > > > > uarows = FOREACH logs GENERATE userAgent; > > total = FOREACH (GROUP uarows ALL) GENERATE COUNT(uarows) as count; > > dump total; > > > > gpuarows = GROUP uarows BY userAgent; > > result = FOREACH gpuarows { > > subtotal = COUNT(uarows); > > GENERATE flatten(group) as ua, subtotal AS SUB_TOTAL, > > 100*(double)subtotal/(double)total.count AS percentage; > > }; > > orderresult = ORDER result BY SUB_TOTAL DESC; > > dump orderresult; > > > > -- what's weird is that 'dump result' works just fine, so it's the ORDER > > line makes trouble > > > > Errors: > > 2013-04-13 10:36:32,409 [Thread-48] INFO > org.apache.hadoop.mapred.MapTask > > - record buffer = 262144/327680 > > 2013-04-13 10:36:32,437 [Thread-48] WARN > > org.apache.hadoop.mapred.LocalJobRunner - job_local_0005 > > java.lang.RuntimeException: > > org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path > > does not exist: > > > file:/home/dliu/ApacheLogAnalysisWithPig/pigsample_259943398_1365820592017 > > at > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:157) > > at > > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62) > > at > > > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) > > at > > > org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:677) > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756) > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > > at > > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) > > Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: > > Input path does not exist: > > > file:/home/dliu/ApacheLogAnalysisWithPig/pigsample_259943398_1365820592017 > > at > > > org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235) > > at > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37) > > at > > > org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252) > > at > org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:177) > > at > > org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:124) > > at > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:131) > > ... 6 more > > 2013-04-13 10:36:32,525 [main] INFO > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > > - HadoopJobId: job_local_0005 > > 2013-04-13 10:36:32,526 [main] INFO > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > > - Processing aliases orderresult > > 2013-04-13 10:36:32,526 [main] INFO > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > > - detailed locations: M: orderresult[19,14] C: R: > > 2013-04-13 10:36:37,536 [main] WARN > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > > - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to > > stop immediately on failure. > > 2013-04-13 10:36:37,536 [main] INFO > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > > - job job_local_0005 has failed! Stop running all dependent jobs > > 2013-04-13 10:36:37,536 [main] INFO > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > > - 100% complete > > 2013-04-13 10:36:37,537 [main] ERROR > > org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed! > > 2013-04-13 10:36:37,538 [main] INFO > > org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics: > > > > HadoopVersion PigVersion UserId StartedAt FinishedAt > Features > > 1.0.4 0.11.0 dliu 2013-04-13 10:35:50 2013-04-13 10:36:37 > > GROUP_BY,ORDER_BY > > > > Some jobs have failed! Stop running all dependent jobs > > > > Job Stats (time in seconds): > > JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime > > MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime > > MedianReducetime Alias Feature Outputs > > job_local_0002 1 1 n/a n/a n/a n/a n/a n/a > > 1-18,logs,total,uarows MULTI_QUERY,COMBINER > > job_local_0003 1 1 n/a n/a n/a n/a n/a n/a > > gpuarows,result GROUP_BY,COMBINER > > job_local_0004 1 1 n/a n/a n/a n/a n/a n/a > > orderresult SAMPLER > > > > Failed Jobs: > > JobId Alias Feature Message Outputs > > job_local_0005 orderresult ORDER_BY Message: Job failed! Error - > > NA file:/tmp/temp-1225021115/tmp-62411972, > > > > Input(s): > > Successfully read 0 records from: > > "file:///home/dliu/ApacheLogAnalysisWithPig/access.log" > > > > Output(s): > > Failed to produce result in "file:/tmp/temp-1225021115/tmp-62411972" > > > > Counters: > > Total records written : 0 > > Total bytes written : 0 > > Spillable Memory Manager spill count : 0 > > Total bags proactively spilled: 0 > > Total records proactively spilled: 0 > > > > Job DAG: > > job_local_0002 -> job_local_0003, > > job_local_0003 -> job_local_0004, > > job_local_0004 -> job_local_0005, > > job_local_0005 > > > > > > 2013-04-13 10:36:37,539 [main] INFO > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > > - Some jobs have failed! Stop running all dependent jobs > > 2013-04-13 10:36:37,541 [main] ERROR org.apache.pig.tools.grunt.Grunt - > > ERROR 1066: Unable to open iterator for alias orderresult > > Details at logfile: > > /home/dliu/ApacheLogAnalysisWithPig/pig_1365820535568.log > >
