I was using Postgres from Heroku which imposes SSL connection, I finally
got it working with query like this

=======================
--- set mapred.job.tracker=local;
add jar /location/to/lib/hive-contrib-0.8.1.jar;
add jar /location/to/lib/postgresql9jdbc3.jar;
set jdbc.drivers = org.postgresql.Driver;
set javax.jdo.option.ConnectionDriverName = org.postgresql.Driver;
CREATE TEMPORARY FUNCTION dboutput AS
'org.apache.hadoop.hive.contrib.genericudf.example.GenericUDFDBOutput';

select dboutput(

'jdbc:postgresql://host:5432/dbname?user=ma_user&password=secret&ssl=true&sslfactory=org.postgresql.ssl.NonValidatingFactory',
  '','',
  'insert into tbl_test(sample_col) values(?)',count(*))
  from hive_table
  where ds='2012-04-01' group by something
============================



But the insertion is super slow, it took about 105 minutes to insert 8000
rows (fyi..I have no index on table). This should have been a import in
milliseconds.

Sqoop might fit-in as better solution until we have much smarter  dboutput
UDF.


-v_abhi_v



On Thu, Mar 29, 2012 at 10:11 PM, Edward Capriolo <edlinuxg...@gmail.com>wrote:

> You have to look at the code to see what the return numbers mean for
> the UDF. In some cases the return is normal. For example hive maps use
> speculative execution and the same insert happens twice violating a
> primary key. The second insert "fails" and produces non 0 but in
> reality all that means is already inserted/
>
> On Thu, Mar 29, 2012 at 6:23 AM, Abhishek Parolkar <abhis...@viki.com>
> wrote:
> > My situation requires me to run hive query every hour and insert selected
> > records to postgres table. It would be nice if dboutput works so that
> reduce
> > jobs (created by hive) can directly write to DB.
> >
> > With sqoop, I will have to create a table everytime in hive and export
> it to
> > a table in DB. Wondering if that can be avoided?
> >
> > -v_abhi_v
> >
> >
> > On Thu, Mar 29, 2012 at 6:12 PM, Bejoy KS <bejoy...@yahoo.com> wrote:
> >>
> >> Hi Abshiek
> >> To transfer data between rdbms and hadoop Sqoop is the preferred and
> >> recommended option. Once you have the process done in hive the output
> data
> >> can be exported to PG with sqoop export command.
> >> Regards
> >> Bejoy KS
> >>
> >> Sent from handheld, please excuse typos.
> >> ________________________________
> >> From: Abhishek Parolkar <abhis...@viki.com>
> >> Date: Thu, 29 Mar 2012 16:25:08 +0800
> >> To: <user@hive.apache.org>
> >> ReplyTo: user@hive.apache.org
> >> Subject: Postgres JDBC + dboutput UDF to export from Hive to remote
> >> Postgres
> >>
> >> Hi There,
> >>   I am trying to get dboutput() UDF to work so that it can write result
> to
> >> a PG DB table.
> >>
> >> ==This is what I did in hive shell==
> >>
> >>   add jar /location/hive_contrib.jar;
> >>   add jar /location/postgresql9jdbc3.jar;
> >>   set jdbc.drivers = org.postgresql.Driver;
> >>
> >>   CREATE TEMPORARY FUNCTION dboutput
> >>
> AS  'org.apache.hadoop.hive.contrib.genericudf.example.GenericUDFDBOutput';
> >>
> >>   select dboutput('jdbc:postgresql//localhost:5432/test','','','insert
> >> into test_tbl(cnt) values(?)',hex(count(*)))
> >>   from some_hive_table
> >>
> >> ===========end of snip=======
> >>
> >> 1.) I am on single node cluster
> >> 2.) I am using Hive 0.8.1
> >> 3.) I on hadoop 1.0.0
> >> 4.) query runs fine but doesnt write to DB, it returns number 2
> >> (http://screencast.com/t/eavnbBHR1x)
> >>
> >> I get no suitable driver error (http://screencast.com/t/OipV14n9FgF) ,
> can
> >> some one tell me how can I load postgres JDBC such
> >> that dboutput recognizes my postgres.
> >>
> >> Any help?
> >>
> >> -v_abhi_v
> >
> >
>

Reply via email to