I have not had any success getting RANK to work in version 0.12.1.
Here is my script:
-------
InputData = LOAD '$in' USING PigStorage('\u0001') AS (a1:chararray,
a2:chararray, score:float);
*Ranked = RANK InputData BY score DESC DENSE;*
OutputData = FOREACH Ranked GENERATE
rank_InputData AS rank,
a1 AS a1,
score AS score;
STORE OutputData INTO '$out' using PigStorage('\u0001');
-------
I've run this with two different versions of input:
1. $in contains 4700 input paths
2. $in contains 60 input paths
These are the same data (280M rows, 25GB) except that the second input has
the data aggregated into a smaller number of files (motivated by this
thread related to a counters-per-mapper issue
<http://mail-archives.apache.org/mod_mbox/pig-user/201304.mbox/%3ccakct+0atdsbbh-xx8zf5+br6uxsmaauckgaznxmyujfwnt_...@mail.gmail.com%3E>
).
The result ("Java heap space" error) is the same for both inputs:
-------
Backend error message
---------------------
Error: Java heap space
Pig Stack Trace
---------------
ERROR 2244: Job failed, hadoop does not return any error message
org.apache.pig.backend.executionengine.ExecException: ERROR 2244: Job
failed, hadoop does not return any error message
at
org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:148)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:202)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:478)
at org.apache.pig.Main.main(Main.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
-------
I have tried another version of the script, that uses ORDER ... BY to do
some of the work:
-------
InputData = LOAD '$in' USING PigStorage('\u0001') AS (a1:chararray,
a2:chararray, score:float);
*InputDataOrdered = ORDER InputData BY score DESC PARALLEL 60;*
*Ranked = RANK InputDataOrdered;*
OutputData = FOREACH Ranked GENERATE
rank_InputDataOrdered AS rank,
a1 AS a1,
score AS score;
STORE OutputData INTO '$out' using PigStorage('\u0001');
-------
Both inputs give the same error, this time "too many counters":
-------
Pig Stack Trace
---------------
ERROR 2043: Unexpected error during execution.
org.apache.pig.backend.executionengine.ExecException: ERROR 2043:
Unexpected error during execution.
at org.apache.pig.PigServer.launchPlan(PigServer.java:1335)
at
org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1309)
at org.apache.pig.PigServer.execute(PigServer.java:1299)
at org.apache.pig.PigServer.executeBatch(PigServer.java:377)
at org.apache.pig.PigServer.executeBatch(PigServer.java:355)
at
org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:140)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:202)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:478)
at org.apache.pig.Main.main(Main.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: java.lang.RuntimeException: Error to read counters into Rank
operation counterSize 0
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.saveCounters(JobControlCompiler.java:388)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.updateMROpPlan(JobControlCompiler.java:334)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:388)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1324)
... 15 more
Caused by: java.io.IOException: Failed on local exception:
java.io.IOException: Error reading responses; Host Details : local host is:
"dataproc1001.ds.uh1.inmobi.com/10.45.9.70"; destination host is: "
glgm1003.grid.uh1.inmobi.com":54311;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:763)
at org.apache.hadoop.ipc.Client.call(Client.java:1229)
at
org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:225)
at org.apache.hadoop.mapred.$Proxy11.getJobCounters(Unknown Source)
at
org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:452)
at org.apache.hadoop.mapreduce.Job.getCounters(Job.java:492)
at
org.apache.pig.backend.hadoop.executionengine.shims.HadoopShims.getCounters(HadoopShims.java:112)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.saveCounters(JobControlCompiler.java:361)
... 18 more
Caused by: java.io.IOException: Error reading responses
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:843)
Caused by: org.apache.hadoop.mapreduce.counters.LimitExceededException: Too
many counters: 121 max=120
at
org.apache.hadoop.mapreduce.counters.Limits.checkCounters(Limits.java:61)
at
org.apache.hadoop.mapreduce.counters.Limits.incrCounters(Limits.java:68)
at
org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.readFields(AbstractCounterGroup.java:174)
at
org.apache.hadoop.mapred.Counters$Group.readFields(Counters.java:278)
at
org.apache.hadoop.mapreduce.counters.AbstractCounters.readFields(AbstractCounters.java:303)
at
org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:280)
at
org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:75)
at
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:952)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:836)
-------
Are these known issues? Have they been fixed in later versions? I can't
see any related JIRA tickets.
Many thanks!
--
_____________________________________________________________
The information contained in this communication is intended solely for the
use of the individual or entity to whom it is addressed and others
authorized to receive it. It may contain confidential or legally privileged
information. If you are not the intended recipient you are hereby notified
that any disclosure, copying, distribution or taking any action in reliance
on the contents of this information is strictly prohibited and may be
unlawful. If you have received this communication in error, please notify
us immediately by responding to this email and then delete it from your
system. The firm is neither liable for the proper and complete transmission
of the information contained in this communication nor for any delay in its
receipt.