ok, thanks!
2014/1/20 Pradeep Gollakota <[email protected]> > It's strange that it's being executed on the Map-side. The group is a > reduce side operation (I'm assuming) and it seems that the nested foreach > would happen on Reduce-side after grouping. Have you looked at the MR plan > to verify that it is being executed Map-side? > > One thing to try might be to CROSS first before grouping... although that > might be 2 reduce steps. > > > On Mon, Jan 20, 2014 at 1:27 AM, Serega Sheypak <[email protected] > >wrote: > > > Hi, I'm in trouble > > Here a part of code: > > > > itemGrp = GROUP itemProj1 BY sale_id PARALLEL 12; > > notFiltered = FOREACH itemGrp{ > > itemProj2 = FOREACH itemProj1 > > GENERATE FLATTEN( > > TOTUPLE(id, other_id)) as > > (id, other_id); > > > > crossed = CROSS itemProj1, itemProj2; > > filtered = FILTER crossed by ( > > --some cond > > ); > > projected = FOREACH filtered GENERATE f1, f2, f3; > > GENERATE FLATTEN(projected) as (f1, f2,f3); > > } > > > > The problem is that all this stuff is executed on map phase. But i want > it > > to be executed on reduce phase to get parallelism benfit. > > Now only two mappers (not to much data before CROSS explosion) perform > > cross inside groups and complicated filtering. > > > > I can't find a way to make it run on reduce-phase... > > What do I do wrong? > > >
