Hi! After hitting the "curse of the last reducer" many times on LEFT OUTER JOIN queries, and trying to think about it, I came to the conclusion there's something I am missing regarding how keys are handled in mapred jobs.
The problem shows when I have table A containing billions of rows with distinctive keys, that I need to join to table B that has a much lower number of rows. I need to keep all the A rows, populated with NULL values from the B side, so that's what a LEFT OUTER is for. Now, when transforming that into a mapred job, my -naive- understanding would be that for every key on the A table, a missing key on the B table would be generated with a NULL value. If that were the case, I fail to understand why all NULL valued B keys would end up on the same reducer, since the key defines which reducer is used, not the value. So, obviously, this is not how it works. So my question is: how is this construct handled? Thanks a lot! D.Morel