Hello Pig experts,

I have the following simple script. For simplicity, I have replaced my UDF with 
this dummy UDF that shows the problem that I am having. UDF TupleTest generates 
a tuple in the following manner:

 boolean randomboolean = rngen.nextBoolean();

               if(randomboolean)
                   {
                       output.set(0, 1);
                       output.set(1, "Black");
                   }
               else
                   {
                       output.set(0, 0);
                       output.set(1, "White");
                   }


Pig script:

REGISTER /N/u/sameer/software/pig-0.11.1/myudfs.jar

DEFINE SequenceFileLoader org.apache.pig.piggybank.storage.SequenceFileLoader();

A = LOAD '/scratch/file.seq' USING SequenceFileLoader AS (key: chararray, 
value: chararray);

AU = FOREACH A GENERATE FLATTEN(myudfs.TupleTest(key, value)) AS (randbool: 
int, randstr: chararray);
STORE AU into '/scratch/AU';

B = GROUP AU BY randbool;
STORE B into '/scratch/B';

X = FOREACH B GENERATE group, COUNT(AU);
DUMP X;


Here is the sample o/p:

hadoop --config $HADOOP_CONF_DIR fs -cat /scratch/AU/part-m-00000
Warning: $HADOOP_HOME is deprecated.

1    Black
1    Black
0    White
1    Black

hadoop --config $HADOOP_CONF_DIR fs -cat /scratch/B/part-r-00000
Warning: $HADOOP_HOME is deprecated.

0    {(0,White)}
1    {(1,Black),(1,Black),(1,Black)}

X: 
(0,2)
(1,2)

As you can see, X is wrong, it should be: (0,1), (1,3). Can you please help me 
with this?

                                          

Reply via email to