Hi, I'm in trouble
Here a part of code:
itemGrp = GROUP itemProj1 BY sale_id PARALLEL 12;
notFiltered = FOREACH itemGrp{
itemProj2 = FOREACH itemProj1
GENERATE FLATTEN(
TOTUPLE(id, other_id)) as
(id, other_id);
crossed = CROSS itemProj1, itemProj2;
filtered = FILTER crossed by (
--some cond
);
projected = FOREACH filtered GENERATE f1, f2, f3;
GENERATE FLATTEN(projected) as (f1, f2,f3);
}
The problem is that all this stuff is executed on map phase. But i want it
to be executed on reduce phase to get parallelism benfit.
Now only two mappers (not to much data before CROSS explosion) perform
cross inside groups and complicated filtering.
I can't find a way to make it run on reduce-phase...
What do I do wrong?