HiveMetaStoreClient

2015-08-25 Thread Jerrick Hoang
Hi all, I want to interact with HiveMetaStore table from code and was looking at http://hive.apache.org/javadocs/r0.13.1/api/metastore/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.html , was wondering if this is the correct way to do this or should I use a jdbc client. If HiveMetaStoreClie

Re: UDF Configure method not getting called

2015-08-25 Thread Jason Dere
For getting the configuration without configure(), this may not be the best thing to do but you can try during your UDF's initialize() method. Note that initialize() is called during query compilation, and also by each M/R task (followed at some point by configure()). ?During initialize() you c

Re: Run multiple queries simultaneously

2015-08-25 Thread Raajay
The back-end execution engine is Tez, and I use YARN for resource management. I completely agree with your deduction that the impact on the run time will be dependent on the nature of the queries. I would like to conduct some experiments (for a given workload, cluster configuration) to quantify th

Re: UDF Configure method not getting called

2015-08-25 Thread Rahul Sharma
Or alternatively, is there a way to pass configuration without using the configure method? The configuration to the UDF is essentially a list of parameters that tells the UDF, what it should morph into this time and what kind of work it should perform. If there is an all encompassing way to do tha

Re: UDF Configure method not getting called

2015-08-25 Thread Rahul Sharma
Oh thanks for the reply, Jason. That was my suspicion too. The UDF in our case is not a function per say in pure mathematical sense of the word 'function'. That is because, it doesn't take in a value and give out another value. It has side effects, that form input for another MapReduce job. The po

Re: UDF Configure method not getting called

2015-08-25 Thread Jason Dere
?There might be a few cases where a UDF is executed locally and not as part of a Map/Reduce job?: - Hive might choose not to run a M/R task for your query (see hive.fetch.task.conversion) - If the UDF is deterministic and has deterministic inputs, Hive might decide to run the UDF once to get

Re: UDF Configure method not getting called

2015-08-25 Thread Rahul Sharma
Also seems like the UDF is being run on the client machine (I am using beeline). No map reduce job gets spawned. I have removed limit clause as I found that solved the issue for someone else in the mailing list. However, still no luck. I looked at the MapredContext class's needConfigure method and

RE: Run multiple queries simultaneously

2015-08-25 Thread Ryan Harris
You need to be a bit more clear with your environment and objective here What is your back-end execution engine? MapReduce, Spark, or Tez? What are you using for resource management? YARN or MapReduce? The running time of one query in the presence of other queries will entirely depend on the

Re: HiveServer2 & Kerberos

2015-08-25 Thread Sergey Shelukhin
Sure! From: Loïc Chanel mailto:loic.cha...@telecomnancy.net>> Reply-To: "user@hive.apache.org" mailto:user@hive.apache.org>> Date: Tuesday, August 25, 2015 at 00:23 To: "user@hive.apache.org" mailto:user@hive.apache.org>> Subject: Re: Hi

Re: Run multiple queries simultaneously

2015-08-25 Thread Sergey Shelukhin
You can start HiveServer2, then submit queries to it using JDBC. If you open multiple sessions using multiple threads, you will be able to submit queries in parallel, although the compilation is still currently serialized. From: Raajay mailto:raaja...@gmail.com>> Reply-To: "user@hive.apache.org<

Re: CBO - get cost of the plan

2015-08-25 Thread John Pullokkaran
#1 The row count estimate for "tableA" inner join “tableC" This depends on the selectivity of Join. The formula is Cardinality(A) * Cardinality(C) * Selectivity We do have logic to infer PK-FK relation ship based on cardinality & NDV. #2 what is the definition of cumulative cost This is the total

UDF Configure method not getting called

2015-08-25 Thread Rahul Sharma
Hi Guys, We have a UDF which extends GenericUDF and does some configuration within the public void configure(MapredContext ctx) method. MapredContext in configure method gives access to the HiveConfiguration via JobConf, which contains custom attributes of the form xy.abc.something. Reading these

RE: Loading multiple file format in hive

2015-08-25 Thread Ryan Harris
A few things.. 1) If you are using spark streaming, I don't see any reason why the output of your spark streaming can't match the necessary destination format...you shouldn't need a second job to read the output from Spark Streaming and convert to parquet. Do a search for spark streaming and la

Re: Repair table doesnt update the transient_lastDdlTime of updated partitions.

2015-08-25 Thread ravi teja
Thanks a lot Noam, you are a saviour! Ravi On Tue, Aug 25, 2015 at 10:03 PM, Noam Hasson wrote: > Hi, > > Check if this helps you: > > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable/PartitionTouch > > Noam. > > On Tue, Aug 25, 2015 at 6:43 PM, r

Re: Repair table doesnt update the transient_lastDdlTime of updated partitions.

2015-08-25 Thread Noam Hasson
Hi, Check if this helps you: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable/PartitionTouch Noam. On Tue, Aug 25, 2015 at 6:43 PM, ravi teja wrote: > Sorry For the incomplete mail, sent bymistake > > I am working towards a incremental solution o

Repair table doesnt update the transient_lastDdlTime of updated partitions.

2015-08-25 Thread ravi teja
Hi, I am working towards a incremental solution on hive based on the transient_lastDdlTime of the partitions. If the we in Thanks, Ravi

Re: Repair table doesnt update the transient_lastDdlTime of updated partitions.

2015-08-25 Thread ravi teja
Sorry For the incomplete mail, sent bymistake I am working towards a incremental solution on hive based on the transient_lastDdlTime of the partitions. We mostly deal with hive external tables. The transient_lastDdlTime of a partition gets updated when the insertion to the table happens via the i

Re: Run multiple queries simultaneously

2015-08-25 Thread Raajay
Noam, I am concerned with cases where the network is a bottleneck. Will i be able control it in YARN ? Ideally, I would like to run multiple queries simultaneously. Raajay On Tue, Aug 25, 2015 at 9:31 AM, Noam Hasson wrote: > I would just limit the resources given to the user on YARN. > > On

Re: Using transform

2015-08-25 Thread Manjee, Sunile
You can use transform when you use a python udf. select transform (column here) using 'python myPythonScript.py' as (column outupt name here) from YourhiveTable where …. Sunile Manjee From: rakesh sharma mailto:rakeshsharm...@hotmail.com>> Reply-To: "user@hive.apache.org

Re: Run multiple queries simultaneously

2015-08-25 Thread Noam Hasson
I would just limit the resources given to the user on YARN. On Tue, Aug 25, 2015 at 4:21 PM, Raajay wrote: > Hello, > > I want to compare the running time of an query when run alone against the > run time in presence of other queries. > > What is the ideal setup required to run this experiment ?

Re: Data Deleted on Hive External Table

2015-08-25 Thread Peyman Mohajerian
Data was generated in some other cluster, they moved it to s3 and then copied it to my cluster into the warehouse path. I then created a schema over it. You are correct that this would not be the right process and we had no plans to do this in production, it was a POC. Nevertheless in my view 'exte

Using transform

2015-08-25 Thread rakesh sharma
Whats the use and purpose of transform in hiveAny help is appreciated thanksrakesh

Re: Data Deleted on Hive External Table

2015-08-25 Thread Jeetendra G
if you put external in the table definition and point INPATH to hive the original data(where data is landing from other source ). then how come data will come to /user/hive/warehouse. /user/hive/warehouse should only be populated with data when its 'internal'? On Tue, Aug 25, 2015 at 7:33 PM, Pe

Re: Data Deleted on Hive External Table

2015-08-25 Thread Peyman Mohajerian
Hi Jeetendra, What I was originally saying is that if you drop the table, it will deleted the data despite the fact that you put 'external' in the definition. I think this behavior is due to the fact that data is in /user/hive/warehouse and therefore Hive assumes ownership and ignores the 'externa

Run multiple queries simultaneously

2015-08-25 Thread Raajay
Hello, I want to compare the running time of an query when run alone against the run time in presence of other queries. What is the ideal setup required to run this experiment ? Should I have two Hive CLI's open and issue queries simultaneously ? How to script such experiment in Hive ? Raajay

Re: Data presentation to consumer layer

2015-08-25 Thread Dr Mich Talebzadeh
Thanks. Impala is an alternative to Hive as I understand. What I am looking for is a tool that I can couple to Hive as the repository. I still think Oracle TimesTen will do the job. On 25/8/2015, "Daniel Haviv" wrote: >Hi, >There is a myriad of solutions, among them: >Impala >Presto >Drill >Ky

Re: Hive over JDBC disable task conversion

2015-08-25 Thread Noam Hasson
Hi Emil, If you are referring to getting back result without running map-reduce job, than I don't believe it's possible, Hive must run map-reduce for the "Order By" part. Noam. On Thu, Aug 20, 2015 at 6:57 PM, Emil Berglind wrote: > I’m running a Hive query over JDBC in a Java app that I wrote

Re: Loading multiple file format in hive

2015-08-25 Thread Nitin Pawar
you are talking about 15 minutes delay to convert the job so you have two options 1) redesign your table in a way where you have two partitions with two file fomrats and you load data from one to other and then clear that partition, so if you query data without partition it will read both file form

Re: Data presentation to consumer layer

2015-08-25 Thread Daniel Haviv
Hi, There is a myriad of solutions, among them: Impala Presto Drill Kylin Tajo On Tue, Aug 25, 2015 at 10:44 AM, Mich Talebzadeh wrote: > Hi, > > > > My question concerns the means of presenting data to consumer layer from > Hive. > > > > Obviously Hive is very suitable for batch analysis. Howe

Data presentation to consumer layer

2015-08-25 Thread Mich Talebzadeh
Hi, My question concerns the means of presenting data to consumer layer from Hive. Obviously Hive is very suitable for batch analysis. However, the MapReduce nature of extracting data make is unlikely as a direct access tool for consumer layer. So my question is what products are there

Re: HiveServer2 & Kerberos

2015-08-25 Thread Loïc Chanel
It is the case. Would you like me to fill a JIRA about it ? Loïc CHANEL Engineering student at TELECOM Nancy Trainee at Worldline - Villeurbanne 2015-08-24 19:24 GMT+02:00 Sergey Shelukhin : > If that is the case it sounds like a bug… > > From: Jary Du > Reply-To: "user@hive.apache.org" > Date

Re: Unsubscribe

2015-08-25 Thread Lefty Leverenz
Mohit, to unsubscribe please send a message to user-unsubscr...@hive.apache.org as described here: Mailing Lists . Thanks. Daniel, is it possible you had a typo in the email address? Or could you have sent the message from a different address than the