Re: Optimize Hive Query

2016-06-27 Thread Eugene Koifman
1:11 PM To: Gopal Vijayaraghavan mailto:gop...@apache.org>> Cc: "user@hive.apache.org<mailto:user@hive.apache.org>" mailto:user@hive.apache.org>> Subject: Re: Optimize Hive Query Thanks Gopal for your inputs For now I have create NON ACID table and loaded data see

Re: Optimize Hive Query

2016-06-27 Thread Mich Talebzadeh
Hi, Curious to see if this issue been resolved (performance) after compaction? Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * http://t

Re: Optimize Hive Query

2016-06-26 Thread @Sanjiv Singh
Thanks Gopal for your inputs For now I have create NON ACID table and loaded data see below from logs proper group splits happening . 2016-06-25 12:52:00,160 [INFO] [InputInitializer {Map 1} #0] |tez.HiveSplitGenerator|: Number of grouped splits: 512 On compaction issue , Compaction enab

Re: Optimize Hive Query

2016-06-24 Thread @Sanjiv Singh
Thanks Gopal for your inputs. Let me run compaction explicitly on table then see how query works. Let Regards Sanjiv Singh Mob : +091 9990-447-339 On Fri, Jun 24, 2016 at 7:53 PM, Gopal Vijayaraghavan wrote: > > > Yes for this tables, ACID enabled. it has only 256 files for each > >buckets

Re: Optimize Hive Query

2016-06-24 Thread Gopal Vijayaraghavan
> Yes for this tables, ACID enabled. it has only 256 files for each >buckets. these are create only when data initially loaded in this table. Yes, the initial load goes in as an insert DELTA too - that requires another compaction to move into base files. The fact that they haven't been automati

Re: Optimize Hive Query

2016-06-24 Thread @Sanjiv Singh
Hi Vijay, Yes for this tables, ACID enabled. it has only 256 files for each buckets. these are create only when data initially loaded in this table. There is not transaction done after that. I see that all file for buckets are also in equal size. One thing that I am not able to understand that

Re: Optimize Hive Query

2016-06-24 Thread Gopal Vijayaraghavan
> Please help me on thislet me know you need other info. Are the ORC tables fully compacted? Looks like you're running a version of Hive-ACID, which does not perform well without compacting delta files. dfs -ls ; should tell you whether there are any delta_* files in the list. > |

Re: Optimize Hive Query

2016-06-24 Thread Mich Talebzadeh
Hi Sanjiv, Normally when it comes to this, I will try to find the section of the code which cause the largest lag SELECT > sb_gu_key, m_d_key, t_ev_st_dt, > LAG( t_ev_st_dt ) OVER ( PARTITION BY m_d_key , sb_gu_key ORDER BY > t_ev_st_dt ) AS LAG_START_DT, > a_z_key, > c_dt, > e_p_dt, > sq_nbr

Re: Optimize Hive Query

2016-06-24 Thread @Sanjiv Singh
Hi Vijay, Please help me on thislet me know you need other info. Regards Sanjiv Singh Mob : +091 9990-447-339 On Thu, Jun 23, 2016 at 12:41 PM, @Sanjiv Singh wrote: > Hi Gopal, > > I am using Tez as execution engine. > > DAG : > > +---

Re: Optimize Hive Query

2016-06-24 Thread @Sanjiv Singh
Hi Mich, I tried the same without any luck. I don't see any improvement. Regards Sanjiv Singh Mob : +091 9990-447-339 On Thu, Jun 23, 2016 at 5:38 PM, @Sanjiv Singh wrote: > Thanks Mich. for your inputs. > > Let me try that as well. Will post response. > > >

RE: Optimize Hive Query

2016-06-23 Thread Markovitz, Dudu
Thanks, I wanted to rule out skewedness over m_d_key,sb_gu_key Dudu From: @Sanjiv Singh [mailto:sanjiv.is...@gmail.com] Sent: Thursday, June 23, 2016 11:55 PM To: user@hive.apache.org; Markovitz, Dudu ; sanjiv singh (ME) Subject: Re: Optimize Hive Query Hi Dudu, find below query response

Re: Optimize Hive Query

2016-06-23 Thread @Sanjiv Singh
Thanks Mich. for your inputs. Let me try that as well. Will post response.

Re: Optimize Hive Query

2016-06-23 Thread @Sanjiv Singh
23, 2016 at 4:01 AM, Markovitz, Dudu wrote: > Could you also add the results of the following query? > > > > Thanks > > > > Dudu > > > > > > select m_d_key > >,sb_gu_key > >,count (*) as cnt > >

Re: Optimize Hive Query

2016-06-23 Thread Mich Talebzadeh
Funny enough it is pretty close to similar ORC transactional tables I have. Standard with 256 buckets with two columns as below number of distinct value in column m_d_key : 29 > number of distinct value in column sb_gu_key : 15434343 You have also vectorised data taking 1024 rows at once. Still

Re: Optimize Hive Query

2016-06-23 Thread @Sanjiv Singh
Hi Mich , Please find below output of command. desc formatted tuning_dd_key ; +---+---+---+--+ | col_name| data_type

Re: Optimize Hive Query

2016-06-23 Thread Jörn Franke
The query looks a little bit too complex from what it is supposed to do. Can you reformulate and restrict the data in a where clause (highest restriction first). Another hint would be to use the Orc format (with indexes and optionally bloom filters) with snappy compression as well as sorting the

Re: Optimize Hive Query

2016-06-23 Thread Mich Talebzadeh
Do you also have the output from desc formatted tuning_dd_key and send the output please? Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * htt

Re: Optimize Hive Query

2016-06-23 Thread @Sanjiv Singh
Hi Gopal, I am using Tez as execution engine. DAG : ++--+ | Explain | +-+--+ | Pla

RE: Optimize Hive Query

2016-06-23 Thread Markovitz, Dudu
- From: Gopal Vijayaraghavan [mailto:go...@hortonworks.com] On Behalf Of Gopal Vijayaraghavan Sent: Thursday, June 23, 2016 9:45 AM To: user@hive.apache.org Subject: Re: Optimize Hive Query > Long running query : Are you running this on MapReduce or Tez? Please post the output

Re: Optimize Hive Query

2016-06-22 Thread Gopal Vijayaraghavan
> Long running query : Are you running this on MapReduce or Tez? Please post the output of explain - if you are seeing > 1 shuffle edge in your query while having only one window for OVER(), that might be the reason. OVER ( PARTITION BY m_d_key , sb_gu_key ORDER BY t_ev_st_dt) The multipl

Optimize Hive Query

2016-06-22 Thread @Sanjiv Singh
Hi All, I am running performance issue with below query. Its took 2-3 hours to complete in hive. Try tried to partition and bucketing changes on this tables, but without luck. Please help me in optimizing this query. what schema level changes can be done ? other parameters recommendations ? *

Re: optimize hive query for multitable join where one table is huge

2014-03-28 Thread Chinna Rao Lalam
Hi, In hive different types of joins are there like join, map join , bucket map join and etc. Please take a look of these it may help you to optimize your query https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization https://www.facebook.com/notes/facebook-engineering/j

optimize hive query for multitable join where one table is huge

2014-03-26 Thread Srinivasan Ramaswamy
I have a join query where i am joining huge tables and i am trying to optimize this hive query. INSERT OVERWRITE TABLE result SELECT /*+ STREAMTABLE(product) */ i.IMAGE_ID, p.PRODUCT_NO, p.STORE_NO, p.PRODUCT_CAT_NO, p.CAPTION, p.PRODUCT_DESC, p.IMAGE1_ID, p