Re: How does tez calculate the number of Mappers/Reducers?

2016-06-24 Thread Gopal Vijayaraghavan
>Do you know how the number of splits is calculated? To do that properly needs a whiteboard and a couple of hours - with the primary complex variable being the YARN headroom calculation. The simplest way to put it would be that it compute splits, tries to find out the available capacity and trie

Re: How does tez calculate the number of Mappers/Reducers?

2016-06-24 Thread Long, Andrew
Ah that makes sense. Thanks again for all the help. Do you know how the number of splits is calculated? I also noticed a couple unusual things in our Splits(as seen below). Primarily getLength() always return 0l, which I’m guessing is possibly causing other problems as well. Also our getSpli

Re: Optimize Hive Query

2016-06-24 Thread @Sanjiv Singh
Thanks Gopal for your inputs. Let me run compaction explicitly on table then see how query works. Let Regards Sanjiv Singh Mob : +091 9990-447-339 On Fri, Jun 24, 2016 at 7:53 PM, Gopal Vijayaraghavan wrote: > > > Yes for this tables, ACID enabled. it has only 256 files for each > >buckets

Re: How does tez calculate the number of Mappers/Reducers?

2016-06-24 Thread Gopal Vijayaraghavan
> While our StorageHandler does utilize a SERDE that correctly returns >SerDeStats, it seems like the optimizer is ignoring these values. AFAIK, the stats impl is assumed to be approximate & aggregate and is never used for setting up execution. > Would anyone know how to correctly set these valu

How does tez calculate the number of Mappers/Reducers?

2016-06-24 Thread Long, Andrew
Hello everyone, How does Tez calculate the number of mappers and reducers? We have a custom StorageHandler, that when used with tez miscalculates the number of mappers when doing a join. I’ve included an EXPLAIN EXTENDED of a sample query below. One thing I have noticed is that under propert

Re: Optimize Hive Query

2016-06-24 Thread Gopal Vijayaraghavan
> Yes for this tables, ACID enabled. it has only 256 files for each >buckets. these are create only when data initially loaded in this table. Yes, the initial load goes in as an insert DELTA too - that requires another compaction to move into base files. The fact that they haven't been automati

Re: Optimize Hive Query

2016-06-24 Thread @Sanjiv Singh
Hi Vijay, Yes for this tables, ACID enabled. it has only 256 files for each buckets. these are create only when data initially loaded in this table. There is not transaction done after that. I see that all file for buckets are also in equal size. One thing that I am not able to understand that

Re: Optimize Hive Query

2016-06-24 Thread Gopal Vijayaraghavan
> Please help me on thislet me know you need other info. Are the ORC tables fully compacted? Looks like you're running a version of Hive-ACID, which does not perform well without compacting delta files. dfs -ls ; should tell you whether there are any delta_* files in the list. > |

Hash table in map join - Hive

2016-06-24 Thread Ross Guth
1. Is there a way to check the size of the hash table created during map side join in Hive/Tez? 2. Is the hash table (small table's), created for the entire table or only for the selected and join key columns? 3. The hash table (created in map side join) spills to disk, if it does not fit in memory

Re: Optimize Hive Query

2016-06-24 Thread Mich Talebzadeh
Hi Sanjiv, Normally when it comes to this, I will try to find the section of the code which cause the largest lag SELECT > sb_gu_key, m_d_key, t_ev_st_dt, > LAG( t_ev_st_dt ) OVER ( PARTITION BY m_d_key , sb_gu_key ORDER BY > t_ev_st_dt ) AS LAG_START_DT, > a_z_key, > c_dt, > e_p_dt, > sq_nbr

Re: Optimize Hive Query

2016-06-24 Thread @Sanjiv Singh
Hi Vijay, Please help me on thislet me know you need other info. Regards Sanjiv Singh Mob : +091 9990-447-339 On Thu, Jun 23, 2016 at 12:41 PM, @Sanjiv Singh wrote: > Hi Gopal, > > I am using Tez as execution engine. > > DAG : > > +---

Re: Optimize Hive Query

2016-06-24 Thread @Sanjiv Singh
Hi Mich, I tried the same without any luck. I don't see any improvement. Regards Sanjiv Singh Mob : +091 9990-447-339 On Thu, Jun 23, 2016 at 5:38 PM, @Sanjiv Singh wrote: > Thanks Mich. for your inputs. > > Let me try that as well. Will post response. > > >

Re: RegexSerDe with Filters

2016-06-24 Thread Arun Patel
Dudu, Thanks for the clarification. Looks like I have an issue with my Hive installation. I tried in a different cluster and it works. Thanks again. On Fri, Jun 24, 2016 at 4:59 PM, Markovitz, Dudu wrote: > This is a tested, working code. > > If you’re using https://regex101.com ,first replac

RE: RegexSerDe with Filters

2016-06-24 Thread Markovitz, Dudu
This is a tested, working code. If you’re using https://regex101.com ,first replace backslash pairs (\\ ) with a single backslash (\) and also use the ‘g’ modifier in order to find all of the matches. The regular expression is - (\S+)\s+([0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}),([

Re: RegexSerDe with Filters

2016-06-24 Thread Arun Patel
Looks like Regex pattern is not working. I tested the pattern on https://regex101.com/ and it does not find any match. Any suggestions? On Thu, Jun 23, 2016 at 3:01 PM, Markovitz, Dudu wrote: > My pleasure. > > Please feel free to reach me if needed. > > > > Dudu > > > > *From:* Arun Patel [ma