Difference between Local Hive Metastore server and A Hive-based Metastore server

2015-12-17 Thread Divya Gehlot
Hi, I am new bee to spark and using 1.4.1 Got confused between Local Metastore server and a hive based metastore server. Can somebody share the usecases when to use which one and pros and cons ? I am using HDP 2,.3.2 in which hive-site-xml is already in spark configuration directory that means H

Hive on Spark throw java.lang.NullPointerException

2015-12-17 Thread Jone Zhang
t1 left outer join t_rd_soft_app_pkg_name t2 on (lower(t1.app_apk) = lower(t2.package_id) and t1.ds = 20151217 and t2.ds = 20151217) where t1.ds = 20151217) t3 left outer join ( select pcid,count(1) cnt from t_ed_soft_evillog_molo where ds=20151217 group by pcid ) t4 on t3.pcid=t4.pcid; *And

Re: Hive partition loan

2015-12-17 Thread Suyog Parlikar
Thanks Alan for the reply. I have one more question on the similar line - Can we move data from one partitions to another in a hive table based on a condition ? If yes , what will be the efficient way to that. Thanks in advance. Regards, Suyog On Dec 17, 2015 11:43 PM, "Alan Gates" wrote: >

Re: Discussion: permanent UDF with database name

2015-12-17 Thread jipengz...@meilishuo.com
@ Furcy Pin I agree you idea! when i found after hive-0.13,user can define permanent UDF.but it must bind with database name. so if we want to use the udf without database name,we must create it at all of the databases name. it take another problem,when we create a new databases.we need get all o

Is there any documentation of how the field delimiter is specified?

2015-12-17 Thread Toby Allsopp
What we want to do is to generate the CREATE TABLE statement for a delimited file where the delimiter has been specified by the user. That is, given a character with ASCII code C, how should we generate the FIELDS TERMINATED BY '?' clause? Is it correct to convert to octal and say '\ooo'? We're

RE: Synchronizing Hive metastores across clusters

2015-12-17 Thread Mich Talebzadeh
Hi Elliot. Strictly speaking I believe your question is when the metastore in the replicate gets out of sync in replicate. So any query against cloud table will only show say partitions at time T0 as opposed to T1? I don’t know what your metastore is on. With ours on Oracle this can happe

Re: Hive on Spark - Error: Child process exited before connecting back

2015-12-17 Thread Xuefu Zhang
These missing classes are in hadoop jar. If you have HADOOP_HOME set, then they should be in Hive classpath. --Xuefu On Thu, Dec 17, 2015 at 10:12 AM, Ophir Etzion wrote: > it seems like the problem is that the spark client needs FSDataInputStream > but is not included in the hive-exec-1.1.0-cd

Re: Synchronizing Hive metastores across clusters

2015-12-17 Thread Sushanth Sowmyan
Also, while I have not wiki-ized the documentation for the above, I have uploaded slides from talks that I've given in hive user group meetup on the subject, and also a doc that describes the replication protocol followed for the EXIM replication that are attached over at https://issues.apache.org/

Re: Synchronizing Hive metastores across clusters

2015-12-17 Thread Sushanth Sowmyan
Hi, I think that the replication work added with https://issues.apache.org/jira/browse/HIVE-7973 is exactly up this alley. Per Eugene's suggestion of MetaStoreEventListener, this replication system plugs into that and gets you a stream of notification events from HCatClient for the exact purpose

Re: Synchronizing Hive metastores across clusters

2015-12-17 Thread Eugene Koifman
Metastore supports MetaStoreEventListener and MetaStorePreEventListener which may be useful here Eugene From: Elliot West mailto:tea...@gmail.com>> Reply-To: "user@hive.apache.org" mailto:user@hive.apache.org>> Date: Thursday, December 17, 2015 at 8:21 AM To: "user@

Re: Hive partition load

2015-12-17 Thread Alan Gates
Yes, you can load different partitions simultaneously. Alan. Suyog Parlikar December 17, 2015 at 5:02 Hello everyone, Can we load different partitions of a hive table simultaneously. Is there any locking issues in that if yes what are they? Please find below

Re: Hive on Spark - Error: Child process exited before connecting back

2015-12-17 Thread Ophir Etzion
it seems like the problem is that the spark client needs FSDataInputStream but is not included in the hive-exec-1.1.0-cdh5.4.3.jar that is passed in the class path. I need to look more in spark-submit / org.apache.spark.deploy to see if there is a way to include more jars. 2015-12-17 17:34:01,679

Re: Synchronizing Hive metastores across clusters

2015-12-17 Thread Jörn Franke
Hive has the export/import commands, alternatively Falcon+oozie > On 17 Dec 2015, at 17:21, Elliot West wrote: > > Hello, > > I'm thinking about the steps required to repeatedly push Hive datasets out > from a traditional Hadoop cluster into a parallel cloud based cluster. This > is not a one

Re: Synchronizing Hive metastores across clusters

2015-12-17 Thread Elliot West
Hi Mich, In your scenario is there any coordination of data syncing on HDFS and metadata in HCatalog? I.e. could a situation occur where the replicated metastore shows a partition as 'present' yet the data that backs the partition in HDFS has not yet arrived at the replica filesystem? I Imagine on

Re: Synchronizing Hive metastores across clusters

2015-12-17 Thread Elliot West
Hi Mich, Thanks for your reply. The cloud cluster is to be used for read-only analytics, so effectively one-way, stand-by. I'll take a look at your suggested technologies as I'm not familiar with them. Thanks - Elliot. On 17 December 2015 at 16:57, Mich Talebzadeh wrote: > Sounds like one way

RE: Synchronizing Hive metastores across clusters

2015-12-17 Thread Mich Talebzadeh
Sounds like one way replication of metastore. Depending on your metastore platform that could be achieved pretty easily. Mine is Oracle and I use Materialised View replication which is pretty good but no latest technology. Others would be GoldenGate or SAP replication server. HTH, Mi

RE: Synchronizing Hive metastores across clusters

2015-12-17 Thread Mich Talebzadeh
Are both clusters in active/active mode or the cloud based cluster is standby? From: Elliot West [mailto:tea...@gmail.com] Sent: 17 December 2015 16:21 To: user@hive.apache.org Subject: Synchronizing Hive metastores across clusters Hello, I'm thinking about the steps required to repeat

Synchronizing Hive metastores across clusters

2015-12-17 Thread Elliot West
Hello, I'm thinking about the steps required to repeatedly push Hive datasets out from a traditional Hadoop cluster into a parallel cloud based cluster. This is not a one off, it needs to be a constantly running sync process. As new tables and partitions are added in one cluster, they need to be s

Re: increase number of reducers

2015-12-17 Thread Muni Chada
Is this table bucketed? If so, please set the number of reducers (set mapreduce.job.reduces=bucket_size) to match to the table's bucket size. On Thu, Dec 17, 2015 at 1:25 AM, Awhan Patnaik wrote: > 3 node cluster with 15 gigs of RAM per node. Two tables L is approximately > 1 Million rows, U is

Fwd: problem with hive.reloadable.aux.jars.path

2015-12-17 Thread Justyna
Hi, I wanted to use hiveserver without restarting it for every auxiliary jar change. According to https://issues.apache.org/jira/browse/HIVE-7553, I switched jars with udfs in folder from the path specified in the hive.reloadable.aux.jars.path. I executed command reload via the beeline. It turne

Hive partition load

2015-12-17 Thread Suyog Parlikar
Hello everyone, Can we load different partitions of a hive table simultaneously. Is there any locking issues in that if yes what are they? Please find below example for more details. Consider I have a hive table test with two partition p1 and p2. I want to load the data into partition p1 and p

Discussion: permanent UDF with database name

2015-12-17 Thread Furcy Pin
Hi Hive users, I would like to pursue the discussion that happened during the design of the feature: https://issues.apache.org/jira/browse/HIVE-6167 Some concern where raised back then, and I think that maybe now that it has been implemented, some user feedbacks could bring water to the mill. Ev