Hi Guys,
I am using PIG for data processing. But the join function seems not work in my
case.
The PIG script is as follow:
A = LOAD './q' USING PigStorage(',') AS (ori_query: chararray, t: chararray, w:
chararray);
B = LOAD './word' USING PigStorage('\t') AS (word: chararray, proID: chararray,
proScore: chararray);
C = JOIN A by t, B by word;
--DUMP C;
STORE C INTO 'join_out';
First I am loading my test case 'q' into A, and then load my test case 'word'
into B.
By "JOIN A by t, B by word', I am expecting an inner join of A's field 't'
with B's field 'word'. In my test case, I have included many common fields in
A.t and B.word.
But I got nothing in my result C. The output file is also empty.
Here is a small piece of 'q': (The document 'q' is attached)
dark shoes for lady,dark,3.234
dark shoes for lady,shoes,2.261
dark shoes for lady,for,1.223
dark shoes for lady,lady,2.345
casual male shoes,casual,3.478
casual male shoes,male,2.675
casual male shoes,shoes,4.265
casual sporty,casual,2.678
Here is a small piece of 'word' (The document 'word' is attached)
for,104365130,0.588235294118
male,104365130, 0.588235294118
35,104365130,0.588235294118
ar,104365132,0.588235294118
cow,104365132,0.652521008403
mm,104365132,0.588235294118
45109,104365135,0.588235294118
medium,104365135,0.588235294118
casual,104365135,0.588235294118
fur,104365135,0.652521008403
lady,104365135,0.652521008403
shoes,104365135,0.6
st,104366010,0.533333333333
ad,104366010,0.533333333333
ray,104366010,0.597619047619
chic,104366010,0.533333333333
d,104394306,0.519480519481
dark,104394306,0.519480519481
comf,104394306,0.574358568261
casual,104394306,0.574358568261
sporty,104394306,0.574358568261
PEACEPRINCESS,104394306,0.0
shoes,104394889,1.15914601533
A.t and B.word are both defined as chararray, I have included my test cases 'q'
and 'word' in the attachment.
Does anyone have an idea why JOIN is not working here?
Many Thanks,
Chloe H