Re: confused on different behavior of Bucketized tables do not support INSERT INTO

2012-05-31 Thread Bruce Bian
t.foo; So what's the reason for throwing that error(i mean why not support insert into a bucketized table from the same table)?And isn't that error message kind of misleading? On Thu, May 31, 2012 at 6:43 PM, Bruce Bian wrote: > I'm using hive 0.9.0 > > On Thurs

Re: confused on different behavior of Bucketized tables do not support INSERT INTO

2012-05-31 Thread Bruce Bian
I'm using hive 0.9.0 On Thursday, May 31, 2012, Bruce Bian wrote: > Hi, > I've got a table vt_new_data which is defined as follows: > CREATE TABLE VT_NEW_DATA > ( > V_ACCOUNT_NUM string > ,V_ACCOUNT_MODIFIER_NUM string > ,V_DEPOSIT_TYPE_CD s

Condition for doing a sort merge bucket map join

2012-05-22 Thread Bruce Bian
Hi , I've got 7 large tables to join(each ~10G in size) into one table, all with the same* 2 *join keys, I've read some documents on sort merge bucket map join, but failed to fire that. I've bucketed all the 7 tables into 20 buckets and sorted by one of the join key, set hive.optimize.bucketmapjoi

Re: how is number of mappers determined in mapside join?

2012-03-20 Thread Bruce Bian
the way 32 mb is too small for a hdfs block size, you may hit NN memory > issues pretty soon. Consider increasing it at least to 64 mb, though all > larger clusters use either 128 or 256 Mb blocks. > > Hope it helps!.. > > Regards > Bejoy > > ---

Re: how is number of mappers determined in mapside join?

2012-03-19 Thread Bruce Bian
point me to the > document from which you got this? > > Regards > Bejoy > > -- > *From:* Bruce Bian > *To:* user@hive.apache.org > *Sent:* Monday, March 19, 2012 2:42 PM > *Subject:* how is number of mappers determined in mapside join? &

how is number of mappers determined in mapside join?

2012-03-19 Thread Bruce Bian
Hi there, when I'm executing the following queries in hive set hive.auto.convert.join = true; CREATE TABLE IDAP_ROOT as SELECT a.*,b.acnt_no FROM idap_pi_root a LEFT OUTER JOIN idap_pi_root_acnt b ON a.acnt_id=b.acnt_id the number of mappers to run in the mapside join is 3, how is it determined?

Reduce the number of map/reduce jobs during join

2012-03-13 Thread Bruce Bian
mber of reducers will be used. If the one > specified in the configuration parameter mapred.reduce.tasks is > negative, hive will use this one as the max number of reducers when > automatically determine number of reducers. > > > Thanks and Regards > > Jagat >

Re: HFileInputFormat for MapReduce

2012-02-09 Thread Bruce Bian
I also encountered this issue when comparing Hive+HBase with Hive+HDFS(native hive tables). After some tuning(ensure data locality, using scan cache,appropriate number of mappers per node etc), Hive+HBase is around 4~5X slower. I guess the two main reasons are : 1) HFile repeats keys for each K/V p