> On Fri, Jun 1, 2012 at 5:25 PM, shan s <mysub...@gmail.com> wrote: > >> I am using Multi-GroupBy-Insert. I was expecting a single map-reduce job >> which would club the group-bys together. >> However it is scheduling n jobs where n = number of group bys.. >> Could you please explain this behaviour. >> >> > No, it will result in at least as many jobs as there is group-bys. The efficiency is hidden not in lowering number of jobs, but in fact that the first job usually reduces the amount of the data that the rest needs to go through. E.g. if the FROM clause includes subquery or when the group-bys have similar WHERE caluses - then this "pre-selection" is executed first and the subsequent jobs operate on the results of the first instead of entire table/partition and are therefore much faster.
J. Dolinar