You are correct on the what I am hoping to do, basically emit two records
for every row. What was interesting was when I just did the union in the
from, it didn't see to do a double table scan. I ended up doing:
INSERT OVERWRITE TABLE table_summary
select col1, unioned_col, count(distinct col4) f
John,
Please correct me if I didn't understand the problem correctly.
I think in this scenario, it's best to think about the query in terms of
MapReduce. In this case, you would want for each record sent as input to
your mapper, two records to be emitted, one with col2's value and one with
col3's
I am trying to do a union, group by, and multi insert all at once. I know
this convoluted but I what I am trying to do is avoid having to scan
through the original table more than once... if I can get all my data from
two columns that I want to pull together, in one round of mappers, I win...
Basi