Dear Pig users,
Can Pig combine sorting and unique-ing into a single job?  Doing this
--define Components, then
Sorted_0 = order Components by block_id parallel $par;
Sorted = DISTINCT Sorted_0;

causes one more MR job to be launched than simply doing this:
--define Components, then
Sorted = order Components by block_id parallel $par;

It would seem there should be some way to do the distinct in the same pass as 
the sort, like 'sort -u'.  But I can't see how. Any tips would be much 
appreciated!

Thanks,
Will

William F Dowling
Senior Technologist
Thomson Reuters

Reply via email to