Hive on Spark

2015-10-22 Thread Jone Zhang
1.How can i set Storage Level when i use Hive on Spark? 2.Do Spark have any intention of dynamically determined Hive on MapReduce or Hive on Spark, base on SQL features. Thanks in advance Best regards

Re: Hive on Spark

2015-10-23 Thread Jone Zhang
mmand. > 2. no. you have to make the call. > > > > On Thu, Oct 22, 2015 at 10:32 PM, Jone Zhang > wrote: > >> 1.How can i set Storage Level when i use Hive on Spark? >> 2.Do Spark have any intention of dynamically determined Hive on >> MapReduce or Hive

Re: Hive on Spark

2015-10-23 Thread Jone Zhang
n Spark uses memory+disk for storage level. > > On Fri, Oct 23, 2015 at 4:29 AM, Jone Zhang > wrote: > >> 1.But It's no way to set Storage Level through properties file in spark, >> Spark provided "def persist(newLevel: StorageLevel)" >> api only... >&g

Re: Hive on Spark NPE at org.apache.hadoop.hive.ql.io.HiveInputFormat

2015-11-01 Thread Jone Zhang
pls check and attach the application master log. 2015-11-02 8:03 GMT+08:00 Jagat Singh : > Hi, > > I am trying to run Hive on Spark on HDP Virtual machine 2.3 > > Following wiki > https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started > > I have replaced all the occurr

Do you have more suggestions on when to use Hive on MapReduce or Hive on Spark?

2015-11-04 Thread Jone Zhang
Hi, Xuefu we plan to move the Hive on MapReduce to Hive on Spark selectively. Because the disposition of cluser consisting of the compute nodes is uneven, we chose the following disposition at last. spark.dynamicAllocation.enabled true spark.shuffle.service.enabled true spark.dynami

Re: Building Spark to use for Hive on Spark

2015-11-19 Thread Jone Zhang
I should add that Spark1.5.0+ is used hive1.2.1 default when you use -Phive So this page shoule write like below “Note that you must have a version of Spark which does *not* include the Hive jars if you use Spark1.

Re: Hive version with Spark

2015-11-19 Thread Jone Zhang
*-Phive is e**nough* *-Phive will use hive1.2.1 default on Spark1.5.0+* 2015-11-19 4:50 GMT+08:00 Udit Mehta : > As per this link : > https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started, > you need to build Spark without Hive. > > On Wed, Nov 18, 2015 at 8:50 AM, Sof

Java heap space occured when the amount of data is very large with the same key on join sql

2015-11-26 Thread Jone Zhang
Here is an error message: java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2245) at java.util.Arrays.copyOf(Arrays.java:2219) at java.util.ArrayList.grow(ArrayList.java:242) at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:216) at java.util.ArrayList.e

Re: Java heap space occured when the amount of data is very large with the same key on join sql

2015-11-28 Thread Jone Zhang
Add a little: The Hive version is 1.2.1 The Spark version is 1.4.1 The Hadoop version is 2.5.1 2015-11-26 20:36 GMT+08:00 Jone Zhang : > Here is an error message: > > java.lang.OutOfMemoryError: Java heap space > at java.util.Arrays.copyOf(Arrays.java:2245) > at java.uti

Why there are two different stages on the same query when i use hive on spark.

2015-12-03 Thread Jone Zhang
Hive1.2.1 on Spark1.4.1 *The first query is:* set mapred.reduce.tasks=100; use u_wsd; insert overwrite table t_sd_ucm_cominfo_incremental partition (ds=20151202) select t1.uin,t1.clientip from (select uin,clientip from t_sd_ucm_cominfo_FinalResult where ds=20151202) t1 left outer join (select uin,

Re: Why there are two different stages on the same query when i use hive on spark.

2015-12-03 Thread Jone Zhang
ks.* 2015-12-03 22:17 GMT+08:00 Xuefu Zhang : > Can you also attach explain query result? What's your data format? > > --Xuefu > > On Thu, Dec 3, 2015 at 12:09 AM, Jone Zhang > wrote: > >> Hive1.2.1 on Spark1.4.1 >> >> *The first query is:* >> set

Re: Managed to make Hive run on Spark engine

2015-12-07 Thread Jone Zhang
More and more people are beginning to use hive on spark. Congratulations! 2015-12-07 9:12 GMT+08:00 Link Qian : > congrats! > > Link Qian > > -- > Date: Sun, 6 Dec 2015 15:44:58 -0500 > Subject: Re: Managed to make Hive run on Spark engine > From: leftylever...@gmail.c

Re: Fw: Managed to make Hive run on Spark engine

2015-12-08 Thread Jone Zhang
You can search for last month's mailing list with "Do you have more suggestions on when to use Hive on MapReduce or Hive on Spark?" I hope for you a little help. Best wishes. 2015-12-08 6:18 GMT+08:00 Ashok Kumar : > > This is great news sir. It shows perseverance pays at last. > > Can you infor

Hive on Spark application will be submited more times when the queue resources is not enough.

2015-12-09 Thread Jone Zhang
*Hi, Xuefu:* *See attachment 1* *When the queue resources is not enough.* *The application application_1448873753366_121022 will pending.* *Two minutes later, the application application_1448873753366_121055 will be submited and pending.* *And then application_1448873753366_121062.* *See attachme

Re: Hive on Spark application will be submited more times when the queue resources is not enough.

2015-12-09 Thread Jone Zhang
(hive.spark.client.server.connect.timeout is set to 5min). Thanks. Best wishes. 2015-12-09 17:56 GMT+08:00 Jone Zhang : > *Hi, Xuefu:* > > *See attachment 1* > *When the queue resources is not enough.* > *The application application_1448873753366_121022 will pending.* > *Two minutes late

Re: Hive on Spark application will be submited more times when the queue resources is not enough.

2015-12-09 Thread Jone Zhang
015-12-09 19:22 GMT+08:00 Jone Zhang : > Hive version is 1.2.1 > Spark version is 1.4.1 > Hadoop version is 2.5.1 > > The application_1448873753366_121062 will success in the above mail. > > But in some cases all of the applications will fail which caused by > SparkContext &g

Re: Hive on Spark application will be submited more times when the queue resources is not enough.

2015-12-09 Thread Jone Zhang
last_lc,guid,sn,vn,vc,mo,rl,os,rv,qv,imei,romid,bn,account_type,account FROMt_ad_tms_heartbeat_ok WHERE last_date > 20150611 AND ds = 20151207) b ON a.qimei=b.qimei; *Thanks.* *Best wishes.* 2015-12-09 19:51 GMT+08:00 Jone Zhang : > But in some cas

Hive on Spark throw java.lang.NullPointerException

2015-12-17 Thread Jone Zhang
*My query is * set hive.execution.engine=spark; select t3.pcid,channel,version,ip,hour,app_id,app_name,app_apk,app_version,app_type,dwl_tool,dwl_status,err_type,dwl_store,dwl_maxspeed,dwl_minspeed,dwl_avgspeed,last_time,dwl_num, (case when t4.cnt is null then 0 else 1 end) as is_evil from (select /

Re: "java.lang.RuntimeException: Reduce operator initialization failed" when running hive on spark

2015-12-20 Thread Jone Zhang
I also encountered the same problem. The error log in Spark UI are as follows Job aborted due to stage failure: Task 465 in stage 12.0 failed 4 times, most recent failure: Lost task 465.3 in stage 12.0 (TID 6732, 10.148.147.52): java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveE

It seems that result of Hive on Spark is mistake And result of Hive and Hive on Spark are not the same

2015-12-22 Thread Jone Zhang
*select * from staff;* 1 jone 22 1 2 lucy 21 1 3 hmm 22 2 4 james 24 3 5 xiaoliu 23 3 *select id,date_ from trade union all select id,"test" from trade ;* 1 201510210908 2 201509080234 2 201509080235 1 test 2 test 2 test *set hive.execution.engine=spark;* *set spark.master=local;* *select /*+map

Re: It seems that result of Hive on Spark is mistake And result of Hive and Hive on Spark are not the same

2015-12-22 Thread Jone Zhang
Hive 1.2.1 on Spark1.4.1 2015-12-22 19:31 GMT+08:00 Jone Zhang : > *select * from staff;* > 1 jone 22 1 > 2 lucy 21 1 > 3 hmm 22 2 > 4 james 24 3 > 5 xiaoliu 23 3 > > *select id,date_ from trade union all select id,"test" from trade ;* > 1 201510210908 >

How to ensure that the record value of Hive on MapReduce and Hive on Spark are completely consistent?

2016-01-07 Thread Jone Zhang
We made a comparison of the number of records between Hive on MapReduce and Hive on Spark.And they are in good agreement. But how to ensure that the record values of Hive on MapReduce and Hive on Spark are completely consistent? Do you have any suggestions? Best wishes. Thanks.

Re: How to ensure that the record value of Hive on MapReduce and Hive on Spark are completely consistent?

2016-01-07 Thread Jone Zhang
2016-01-08 11:37 GMT+08:00 Jone Zhang : > We made a comparison of the number of records between Hive on MapReduce > and Hive on Spark.And they are in good agreement. > But how to ensure that the record values of Hive on MapReduce and Hive on > Spark are completely consistent? >

Two results are inconsistent when i use Hive on Spark

2016-01-26 Thread Jone Zhang
*I have run a query many times, there will be two results without regular.* *One is 36834699 and other is 18464706.* *The query is * set spark.yarn.queue=soft.high; set hive.execution.engine=spark; select /*+mapjoin(t3,t4,t5)*/ count(1) from ( select coalesce(t11.qua,t12.qua,t13.qua) qua

Re: Two results are inconsistent when i use Hive on Spark

2016-01-26 Thread Jone Zhang
*Some properties on hive-site.xml is* hive.ignore.mapjoin.hint false hive.auto.convert.join true hive.auto.convert.join.noconditionaltask true *If more information is required,please let us know.* *Thanks.* 2016-01-27 15:20 GMT+08:00 Jone Zhang : > *I have run a query many times, th

How to create auto increment key for a table in hive?

2017-04-11 Thread Jone Zhang
The hive table write by many people. How to create auto increment key for a table in hive? For example create table test(id, value) load data v1 v2 into table test load data v3 v4 into table test select * from test 1 v1 2 v2 3 v3 4 v4 ... Thanks

Fwd: Why spark.sql.autoBroadcastJoinThreshold not available

2017-05-11 Thread Jone Zhang
Solve it by remove lazy identity. 2.HiveContext.sql("cache table feature as "select * from src where ...) which result size is only 100K -- Forwarded message ------ From: Jone Zhang Date: 2017-05-10 19:10 GMT+08:00 Subject: Why spark.sql.autoBroadcastJoinThreshold not av

How can i merge multiple rows to one row in sparksql or hivesql?

2017-05-15 Thread Jone Zhang
For example Data1(has 1 billion records) user_id1 feature1 user_id1 feature2 Data2(has 1 billion records) user_id1 feature3 Data3(has 1 billion records) user_id1 feature4 user_id1 feature5 ... user_id1 feature100 I want to get the result as follow user_id1 feature1 feature2 feature3 featu

Wish you give our product a wonderful name

2017-09-08 Thread Jone Zhang
We have built an an ml platform, based on open source framework like hadoop, spark, tensorflow. Now we need to give our product a wonderful name, and eager for everyone's advice. Any answers will be greatly appreciated. Thanks.