Hi Pig users, I have a question regarding how to handle a large bag of data in reduce step. It happens that after I do the following (see below), each group has about 100GB of data to process. The bag is spilled continuously and the job is very slow. What is your recommendation of speeding the processing when you find yourself a large bag of data (over 100GB) to process?
A = LOAD '/tmp/data'; B = GROUP A by $0; C = FOREACH B generate FLATTEN($1); -- this takes very very long because of a large bag Best Regards, Jerry
