Re: Tez reducer parallelism ..

2016-03-15 Thread Gautam
> The windowing is not simultaneous unless they are all over the same window > - the following query has 3 different windows applied over the same rows > sequentially. Ok. Just wanted to confirm. Maybe I could restructure my query to get more parallelism .. > They are all over the same rows so th

Re: Tez reducer parallelism ..

2016-03-15 Thread Gopal Vijayaraghavan
> A lot of our queries do the following style of simultaneous windowing .. The windowing is not simultaneous unless they are all over the same window - the following query has 3 different windows applied over the same rows sequentially. > SELECT >row_number() OVER( PARTITION BY app, user, > t

Re: Issue with Star schema

2016-03-15 Thread Thejas Nair
As suggested, looking at the explain plan should tell you if map-join is getting used. Using a recent version with hive-on-tez would also give you further speedup as map-joins are optimized further in it. On Tue, Mar 15, 2016 at 9:32 AM, sreebalineni . wrote: > You can think of map joins.If clus

Tez reducer parallelism ..

2016-03-15 Thread Gautam
Hello, I'm trying to optimize some queries in Hive that were recently switched to Tez.. had a general question regarding reducer parallelism .. A lot of our queries do the following style of simultaneous windowing .. SELECT row_number() OVER( PARTITION BY app, user, type ORDER BY ts ) as

Re: Issue with Star schema

2016-03-15 Thread Mich Talebzadeh
How about using Hive on Spark so your A is your fact table and the rest of your tables are dimensions. 20 million rows are not that big. has your fact table partitioned and more importantly scattered by your dimensional keys? CLUSTERED BY ( prod_id, cust_id, time_id, channel_id, promo_i

Re: Issue with Star schema

2016-03-15 Thread Gopal Vijayaraghavan
>I have a query where I am joining with 10 other entities Are you using Tez? This looks like an obvious candidate for a broadcast join. Cheers, Gopal

Re: Issue with Star schema

2016-03-15 Thread sreebalineni .
You can think of map joins.If cluster is configured by default it must be happening already check query profile On Tue, 15 Mar 2016 21:12 Himabindu sanka, wrote: > Hi Team, > > > > I have a query where I am joining with 10 other entities > > > > Like > > > > Select a.col1,b1.col1,b2.col1 from >

Re: hive read/write hbase

2016-03-15 Thread songj songj
i entered 'add jar /home/hadoop/apache-hive-1.0.1-bin/lib/hive-hbase-handler-1.0.1.jar ' but it alose get the same exception. my hbase is 1.1.1 and hive is 1.0.1,maybe the version of handler.jar is wrong? why hive cluster need the hdfs(hdfs://A) of hbase cluster? Caused by: java.lang.IllegalA

hiveOnSpark blocked at SparkClientFactory.createClient

2016-03-15 Thread
Hi Xuefu and All, recently we are doing concurrent testing, we found the submit request are blocked at the SparkClientFactory.createClient method, and there are a synchronized lock on the method, i want to know is this lock required ? the thread which hold this lock is doing rpcServe

Does hive metastore service support proxy user access ?

2016-03-15 Thread Jeff Zhang
I try to access hive metastore service using proxy user, but didn't succeed. I just wonder whether hive metastore support this kind of access ? 16/03/15 08:57:57 DEBUG security.UserGroupInformation: PrivilegedAction as:jeff (auth:PROXY) via l...@example.com (auth:KERBEROS) from:org.apache.hadoop.h

Fwd: Inserting into a Hive transactional table using Java code

2016-03-15 Thread Mich Talebzadeh
Hive 2:00 Spark engine 1.3.1 Eclipse Scala IDE build of Eclipse SDK A simple routine that reads CSV files from a staging directory by creating an external table and insert into an ORC transactional table. Using beeline from another server the task finishes pretty quickly. The code is shown below

Fwd: Inserting into a Hive transactional table using Java code

2016-03-15 Thread Mich Talebzadeh
Hive 2:00 Spark engine 1.3.1 Eclipse Scala IDE build of Eclipse SDK A simple routine that reads CSV files from a staging directory by creating an external table and insert into an ORC transactional table. Using beeline from another server the task finishes pretty quickly. The code is shown in att