Re: Hash table in map join - Hive

2016-07-15 Thread Gopal Vijayaraghavan
> When is OOM error actually thrown? With >hive.mapjoin.hybridgrace.hashtable set to true, spilling should be >possible, so OOM error should not come. ... > Is it the case when the hash table of not even one of the 16 partitions >fits in memory? It will OOM if any one of them overflows. The gra

Re: Hash table in map join - Hive

2016-07-15 Thread Lalitha MV
Hi Gopal, Thanks a lot for the above update. I had only one question hanging: When is OOM error actually thrown? With hive.mapjoin.hybridgrace.hashtable set to true, spilling should be possible, so OOM error should not come. Is it the case when the hash table of not even one of the 16 partitions

Re: Hash table in map join - Hive

2016-07-14 Thread Gopal Vijayaraghavan
Hi, I got a chance to re-run this query today and it does auto-reduce the new CUSTOM_EDGE join as well. However it does it too early, before it has got enough information about both sides of the join. TPC-DS Query79 at 1Tb scale generates a CUSTOM_EDGE between the ms alias and customer tables.

Re: Hash table in map join - Hive

2016-07-06 Thread Gopal Vijayaraghavan
> I tried running the shuffle hash join with auto reducer parallelism >again. But, it didn't seem to take effect. With merge join and auto >reduce parallelism on, number of > reducers drops from 1009 to 337, but didn't see that change in case of >shuffle hash join .Should I be doing something more

Re: Hash table in map join - Hive

2016-07-01 Thread Lalitha MV
Hi Gopal, Since this jira is resolved, I cloned the master branch, compiled and used the binaries (0.9 snapshot version of tez). I tried running the shuffle hash join with auto reducer parallelism again. But, it didn't seem to take effect. With merge join and auto reduce parallelism on, number of

Re: Hash table in map join - Hive

2016-06-30 Thread Gopal Vijayaraghavan
> But, I got a comment from the author that, the patch wouldn't affect -- >hive.tez.auto.reducer.parallelism=true. > Am I missing something? No, I've linked to the wrong JIRA :( Cheers, Gopal

Re: Hash table in map join - Hive

2016-06-30 Thread Lalitha MV
Also, a couple of follow up questions: 1. The grace hash has to reload/ rebuild the hash table for a new split, only if it has spilled the hash table because of lack of memory space right? How does the regular hash join handle the case when the hash table cannot fit into memory? Does it create

Re: Hash table in map join - Hive

2016-06-30 Thread Lalitha MV
Hi, I was following this thread. I tried adding the patch of the jira manually ( https://issues.apache.org/jira/browse/TEZ-3287 ) [referenced in the above reply for auto reducer optimization in shuffle hash join case]. I added it to 0.8.3 while the patch was for the master. But, I got a comment fr

Re: Hash table in map join - Hive

2016-06-30 Thread Gopal Vijayaraghavan
> 1. In the query plan, it still says Map Join Operator (Would have >expected it to be named as Reduce side operator). The "Map" in that case refers really to Map rather the hadoop version. An unambigous name is if it were called the HashJoinOperator. This is one of the optimizations of Tez righ

Re: Hash table in map join - Hive

2016-06-29 Thread Ross Guth
Hi Gopal, I saw the log files and the hash table information in it. Thanks. Also, I enforced shuffle hash join. I had a couple of questions around it: 1. In the query plan, it still says Map Join Operator (Would have expected it to be named as Reduce side operator). 2. The edges in this query pl

Re: Hash table in map join - Hive

2016-06-27 Thread Gopal Vijayaraghavan
> 1. OOM condition -- I get the following error when I force a map join in >hive/tez with low container size and heap size:" >java.lang.OutOfMemoryError: Java heap space". I was wondering what is the >condition which leads to this error. You are not modifying the noconditionaltasksize to match th

Re: Hash table in map join - Hive

2016-06-27 Thread Ross Guth
Hi Gopal, Thanks a lot for the answers. They were helpful. I have a few more questions regarding this: 1. OOM condition -- I get the following error when I force a map join in hive/tez with low container size and heap size:" java.lang.OutOfMemoryError: Java heap space". I was wondering what is th

Re: Hash table in map join - Hive

2016-06-27 Thread Gopal Vijayaraghavan
> 1. Is there a way to check the size of the hash table created during map >side join in Hive/Tez? Only from the log files. However, you enable hive.tez.exec.print.summary=true; then the hive CLI will print out the total # of items shuffle from the broadcast edges feeding the hashtable. Not sure

Hash table in map join - Hive

2016-06-24 Thread Ross Guth
1. Is there a way to check the size of the hash table created during map side join in Hive/Tez? 2. Is the hash table (small table's), created for the entire table or only for the selected and join key columns? 3. The hash table (created in map side join) spills to disk, if it does not fit in memory