FULL OUTER JOIN Two Small Tables More Efficiently in Hive?

2013-10-16 Thread Ji Zhang
Hi All, I have two tables. One has 2,000,000 rows (150M in 6 files), and the other has 5,000 rows (400K in 1 file). The join is (approximately) a full outer join, since the city_id field has only 100 distinct values: CREATE TABLE prop_total AS SELECT * FROM prop_1 a JOIN prop_2 b ON a.city_id = b

Fail to Increase Hive Mapper Tasks?

2014-01-01 Thread Ji Zhang
Hi, I have a managed Hive table, which contains only one 150MB file. I then do "select count(*) from tbl" to it, and it uses 2 mappers. I want to set it to a bigger number. First I tried 'set mapred.max.split.size=8388608;', so hopefully it will use 19 mappers. But it only uses 3. Somehow it stil

Re: Fail to Increase Hive Mapper Tasks?

2014-01-02 Thread Ji ZHANG
VE is using the old Hadoop MapReduce API and so > mapred.max.split.size won't work. > > -----Original Message- > From: Ji Zhang [mailto:zhangj...@gmail.com] > Sent: Thursday, January 02, 2014 3:56 PM > To: user@hive.apache.org > Subject: Fail to Increase Hive Mapper T

CDH4.5 HiveServer2 InterruptedException

2014-08-17 Thread Ji ZHANG
Hi, I'm using CDH4.5 and its built-in HiveServer2. Sometimes it throws the following exception, and the job cannot be submitted: 2014-08-18 09:16:33,346 INFO org.apache.hadoop.hive.ql.exec.ExecDriver: Making Temp Directory: hdfs://nameservice1/tmp/hive-hive-hadoop/hive_2014-08-18_09-16-32_093_332