Re: Quering RDBMS table in a Hive query

2012-06-15 Thread Bejoy KS
Hi Ruslan The solution Esteban pointed out was 1. Import look up data from RDBMS to hdfs/hive (you can fire any adhoc query here). If the data is just a few mbs one or two maps/connections are enough. 2. A look up on this smaller data can be achieved in terms of joining that with larger table

Re: Quering RDBMS table in a Hive query

2012-06-15 Thread Ruslan Al-Fakikh
Hi Esteban, Your solution is what I am trying to avoid, having to keep the hdfs data up-to-date. I know I can easily schedule a dependency between the Sqoop import job and the hive query job and currently we have a scheduling tool (opswise) for such things. But what if I just want to run an ad hoc

Block Sampling Impact

2012-06-15 Thread Ladda, Anand
Hi I was trying block sampling on a 6 million (~400MB sized table) and can see if I sample about 1 percent of the data I get about 3x faster response on the queries (I can also see difference in the data returned). The input format though is 'org.apache.hadoop.mapred.TextInputFormat' and not Co

Re: Block Sampling

2012-06-15 Thread Carl Steinbach
Done! On Fri, Jun 15, 2012 at 12:26 PM, Ladda, Anand wrote: > Thanks Carl. Could you give me edit rights to the wiki ( > ala...@microstrategy.com) to update the sampling page with this info > > ** ** > > *From:* Carl Steinbach [mailto:c...@cloudera.com] > *Sent:* Friday, June 15, 2012 3:20 P

RE: Block Sampling

2012-06-15 Thread Ladda, Anand
Thanks Carl. Could you give me edit rights to the wiki (ala...@microstrategy.com) to update the sampling page with this info From: Carl Steinbach [mailto:c...@cloudera.com] Sent: Friday, June 15, 2012 3:20 PM To: user@hive.apache.org Subject: Re: Block Sampling

Re: Block Sampling

2012-06-15 Thread Carl Steinbach
Hi Anand, This feature was implemented in HIVE-2121 and appeared in Hive 0.8.0. Ref: https://issues.apache.org/jira/browse/HIVE-2121 Thanks. Carl On Fri, Jun 15, 2012 at 11:59 AM, Ladda, Anand wrote: > Has the block sampling feature been added to one of the latest (Hive 0.8 > or Hive 0.9) re

Block Sampling

2012-06-15 Thread Ladda, Anand
Has the block sampling feature been added to one of the latest (Hive 0.8 or Hive 0.9) releases. The wiki has the blurb below on block sampling Block Sampling It is a feature that is still on trunk and is not yet in any release version. block_sample: TABLESAMPLE (n PERCENT) This will allow Hive to

Re: Quering RDBMS table in a Hive query

2012-06-15 Thread Esteban Gutierrez
Hi Ruslan, Jan's approach sounds like a good workaround only if you can use the output in a mapjoin, but I don't think it will scale nicely if you have a very large number of tasks since that will translate as DB connections to MySQL. I think a more scalable and reliable way is just to schedule

Re: Quering RDBMS table in a Hive query

2012-06-15 Thread Ruslan Al-Fakikh
Thanks Jan On Fri, Jun 15, 2012 at 4:35 PM, Jan Dolinár wrote: > On 6/15/12, Ruslan Al-Fakikh wrote: >> I didn't know InputFormat and LineReader could help, though I didn't >> look at them closely. I was thinking about implementing a >> Table-Generating Function (UDTF) if there is no an already

Hive tar ball snapshot build

2012-06-15 Thread kulkarni.swar...@gmail.com
I was looking into the snapshot builds for hive[1] and noticed that there is no snapshot tar ball available. Is there a reason why we don't build them? If not, should we be adding that to the build so that interested people can simply pull this bleeding edge tar ball and start playing with it rathe

Re: Quering RDBMS table in a Hive query

2012-06-15 Thread Jan Dolinár
On 6/15/12, Ruslan Al-Fakikh wrote: > I didn't know InputFormat and LineReader could help, though I didn't > look at them closely. I was thinking about implementing a > Table-Generating Function (UDTF) if there is no an already implemented > solution. Both is possible, InputFormat and/or UD(T)F.

Re: Quering RDBMS table in a Hive query

2012-06-15 Thread Ruslan Al-Fakikh
Thanks Jan, I didn't know InputFormat and LineReader could help, though I didn't look at them closely. I was thinking about implementing a Table-Generating Function (UDTF) if there is no an already implemented solution. Ruslan On Thu, Jun 14, 2012 at 10:03 AM, Jan Dolinár wrote: > Hi Ruslan, >

Hive-0.8.1 PHP Thrift client broken?

2012-06-15 Thread Ruben de Vries
Hey Guys, I've been slamming my head into a wall before on this issue, but now that I'm a bit more familiar with Hive and Thrift (I got the python version working) I figured I should try fixing the problem or find out more about it to contribute some to the project too :) The php thriftclient