A simple pig script to get the distinct elements of a gouped bag.
data = LOAD .............;
data_pro = FOREACH data GENERATE custid, ip, userid;
data_group = GROUP data_pro BY custid;
data_group_pro = FOREACH data_group GENERATE group, Distinct(data_pro.userid),
Distinct(data_pro.ip);
But there's error:
java.lang.ClassCastException: org.apache.pig.data.SingleTupleBag cannot be cast
to org.apache.pig.data.Tuple
at org.apache.pig.builtin.Distinct$Initial.exec(Distinct.java:86)
at org.apache.pig.builtin.Distinct$Initial.exec(Distinct.java:75)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:337)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:376)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:354)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:308)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:263)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
I am using pig 0.11 which is packaged in chd4.3.2, where i can see the source
code:
http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.pig/pig/0.11.0-cdh4.3.2/org/apache/pig/builtin/Distinct.java
Any insight on this?Thanks,Lei
[email protected]