Dear Pig users, Can Pig combine sorting and unique-ing into a single job? Doing this --define Components, then Sorted_0 = order Components by block_id parallel $par; Sorted = DISTINCT Sorted_0;
causes one more MR job to be launched than simply doing this: --define Components, then Sorted = order Components by block_id parallel $par; It would seem there should be some way to do the distinct in the same pass as the sort, like 'sort -u'. But I can't see how. Any tips would be much appreciated! Thanks, Will William F Dowling Senior Technologist Thomson Reuters
