I was using Postgres from Heroku which imposes SSL connection, I finally got it working with query like this
======================= --- set mapred.job.tracker=local; add jar /location/to/lib/hive-contrib-0.8.1.jar; add jar /location/to/lib/postgresql9jdbc3.jar; set jdbc.drivers = org.postgresql.Driver; set javax.jdo.option.ConnectionDriverName = org.postgresql.Driver; CREATE TEMPORARY FUNCTION dboutput AS 'org.apache.hadoop.hive.contrib.genericudf.example.GenericUDFDBOutput'; select dboutput( 'jdbc:postgresql://host:5432/dbname?user=ma_user&password=secret&ssl=true&sslfactory=org.postgresql.ssl.NonValidatingFactory', '','', 'insert into tbl_test(sample_col) values(?)',count(*)) from hive_table where ds='2012-04-01' group by something ============================ But the insertion is super slow, it took about 105 minutes to insert 8000 rows (fyi..I have no index on table). This should have been a import in milliseconds. Sqoop might fit-in as better solution until we have much smarter dboutput UDF. -v_abhi_v On Thu, Mar 29, 2012 at 10:11 PM, Edward Capriolo <edlinuxg...@gmail.com>wrote: > You have to look at the code to see what the return numbers mean for > the UDF. In some cases the return is normal. For example hive maps use > speculative execution and the same insert happens twice violating a > primary key. The second insert "fails" and produces non 0 but in > reality all that means is already inserted/ > > On Thu, Mar 29, 2012 at 6:23 AM, Abhishek Parolkar <abhis...@viki.com> > wrote: > > My situation requires me to run hive query every hour and insert selected > > records to postgres table. It would be nice if dboutput works so that > reduce > > jobs (created by hive) can directly write to DB. > > > > With sqoop, I will have to create a table everytime in hive and export > it to > > a table in DB. Wondering if that can be avoided? > > > > -v_abhi_v > > > > > > On Thu, Mar 29, 2012 at 6:12 PM, Bejoy KS <bejoy...@yahoo.com> wrote: > >> > >> Hi Abshiek > >> To transfer data between rdbms and hadoop Sqoop is the preferred and > >> recommended option. Once you have the process done in hive the output > data > >> can be exported to PG with sqoop export command. > >> Regards > >> Bejoy KS > >> > >> Sent from handheld, please excuse typos. > >> ________________________________ > >> From: Abhishek Parolkar <abhis...@viki.com> > >> Date: Thu, 29 Mar 2012 16:25:08 +0800 > >> To: <user@hive.apache.org> > >> ReplyTo: user@hive.apache.org > >> Subject: Postgres JDBC + dboutput UDF to export from Hive to remote > >> Postgres > >> > >> Hi There, > >> I am trying to get dboutput() UDF to work so that it can write result > to > >> a PG DB table. > >> > >> ==This is what I did in hive shell== > >> > >> add jar /location/hive_contrib.jar; > >> add jar /location/postgresql9jdbc3.jar; > >> set jdbc.drivers = org.postgresql.Driver; > >> > >> CREATE TEMPORARY FUNCTION dboutput > >> > AS 'org.apache.hadoop.hive.contrib.genericudf.example.GenericUDFDBOutput'; > >> > >> select dboutput('jdbc:postgresql//localhost:5432/test','','','insert > >> into test_tbl(cnt) values(?)',hex(count(*))) > >> from some_hive_table > >> > >> ===========end of snip======= > >> > >> 1.) I am on single node cluster > >> 2.) I am using Hive 0.8.1 > >> 3.) I on hadoop 1.0.0 > >> 4.) query runs fine but doesnt write to DB, it returns number 2 > >> (http://screencast.com/t/eavnbBHR1x) > >> > >> I get no suitable driver error (http://screencast.com/t/OipV14n9FgF) , > can > >> some one tell me how can I load postgres JDBC such > >> that dboutput recognizes my postgres. > >> > >> Any help? > >> > >> -v_abhi_v > > > > >