Re:Re: Re: multiple tables join with only one hug table.

2011-08-14 Thread Daniel,Wu
a simple usage: for retailer data, which keep 10 years of data, that's 10 * 365 =3650 records in the calendar dimension, if there are 8000 stores and 8000 products, totally the sales will have 8000 * 8000 * 3650 =233,600,000,000 records if we has one record for each product/day/store combinati

Re: Re: multiple tables join with only one hug table.

2011-08-13 Thread Koert Kuipers
I am not aware of any optimization that does something like that. Anyone? Also your suggestion means 10 hash tables would have to be in memory. I think that with a normal map-reduce join in hive you can join 10 tables at once (meaning in a single map-reduce) if they all join on the same key. 2011

Re: multiple tables join with only one hug table.

2011-08-12 Thread Koert Kuipers
A mapjoin does what you described: it builds hash tables for the smaller tables. In recent versions of hive (like the one i am using with cloudera cdh3u1) a mapjoin will be done for you automatically if you have your parameters set correctly. The relevant parameters in hive-site.xml are: hive.auto.

Re: multiple tables join with only one hug table.

2011-08-11 Thread Ayon Sinha
The Mapjoin hint syntax help optimize by loading the smaller tables specified in the Mapjoin hint into memory. Then every small table is in memory of each mapper.   -Ayon See My Photos on Flickr Also check out my Blog for answers to commonly asked questions. Fr