Hello,
I am trying to use merge join to speed up a join operator that finding friends
of a list of person from a social network. But in my trail I get incorrect
output.
Could you give me some advise about what I have do wrong?
Here is my code:
```
/* Find frinds of sample, merge join
output: (id, frd)
*/
DEFINE seek_friend_merge(rel, samp) RETURNS samp_frd {
jnd = JOIN $rel BY src, $samp BY id USING 'merge';
$samp_frd = FOREACH jnd GENERATE $samp::id AS id, $rel::dst AS frd;
};
/* Sort data */
DEFINE sort_by_key(data, key, reducer) RETURNS sorted {
$sorted = ORDER $data BY $key PARALLEL $reducer;
}
-- Load data
rel = LOAD '$REL' AS (src: LONG, dst: LONG);
samp = LOAD '$SAMPLE' AS (id: LONG);
-- Step 1, pre-sort data
rel_sorted = sort_by_key(rel, src, $PARALLEL_FACTOR);
STORE rel_sorted INTO '$SORT_TMP';
-- Step 2, merge join
rel_sorted = LOAD '$SORT_TMP' AS (src: LONG, dst: LONG);
samp_sort = sort_by_key(samp, id, 100);
samp_frd = seek_friend_merge(rel_sorted, samp_sort);
STORE samp_frd INTO '$SAMP_FRD';
```
Here *rel* is a big table and *samp* is pretty small.
Alcaid