Hi,

I recently tried Hive-645 feature and save query results directly to Mysql 
table. The feature can be found here: 
https://issues.apache.org/jira/browse/HIVE-645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel.
The query I tried looks like this:

hive>CREATE TEMPORARY FUNCTION dboutput AS 
'org.apache.hadoop.hive.contrib.genericudf.example.GenericUDFDBOutput';
hive>SELECT dboutput('jdbc:mysql://localhost/hdfs_db','uid','pwd','INSERT INTO 
dc(t,c) VALUES (?,?)',requestbegintime,count(1)) FROM impressions2 GROUP BY 
requestbegintime;

It works, but the reduce tasks are very slow:

Task

Complete

Status

Start Time

Finish Time

Errors

Counters

task_201203081246_0001_r_000000<http://wlu-hadoop01:50030/taskdetails.jsp?jobid=job_201203081246_0001&tipid=task_201203081246_0001_r_000000>

100.00%


reduce > reduce

8-Mar-2012 12:47:06

8-Mar-2012 13:33:54 (46mins, 47sec)

11<http://wlu-hadoop01:50030/taskstats.jsp?jobid=job_201203081246_0001&tipid=task_201203081246_0001_r_000000>


I set #reduce to be 4 but is still very slow (finally 171, 667 rows are 
inserted to Mysql).

I guess the reduce process didn't insert data to MySql in batch mode, can 
anyone give me some suggestions to improve the performance??
PS: I think it might be better to first save results to HDFS and then use Sqoop 
to load data to Mysql, right??

Regards,
Wei

Reply via email to