Re: Hive generating different DAGs from the same query

2018-09-11 Thread Sungwoo Park
Hello Gopal, I have been looking further into this issue, and have found that the non-determinstic behavior of Hive in generating DAGs is actually due to the logic in AggregateStatsCache.findBestMatch() called from AggregateStatsCache.get(), as well as the disproportionate distribution of Nulls in

Re: Hive generating different DAGs from the same query

2018-07-19 Thread Gopal Vijayaraghavan
> My conclusion is that a query can update some internal states of HiveServer2, > affecting DAG generation for subsequent queries. Other than the automatic reoptimization feature, there's two other potential suspects. First one would be to disable the in-memory stats cache's variance param, wh

Fwd: Hive generating different DAGs from the same query

2018-07-19 Thread Sungwoo Park
Hello Zoltan, I further tested, and found no Exception (such as MapJoinMemoryExhaustionError) during the run. So, the query ran fine. My conclusion is that a query can update some internal states of HiveServer2, affecting DAG generation for subsequent queries. Moreover, the same query may or may n

Re: Hive generating different DAGs from the same query

2018-07-13 Thread Zoltan Haindrich
Hello Sungwoo! I think its possible that reoptimization is kicking in, because the first execution have bumped into an exception. I think the plans should not be changing permanently; unless "hive.query.reexecution.stats.persist.scope" is set to a wider scope than query. To check that indeed

Hive generating different DAGs from the same query

2018-07-11 Thread Sungwoo Park
Hello, I am running the TPC-DS benchmark using Hive 3.0, and I find that Hive sometimes produces different DAGs from the same query. These are the two scenarios for the experiment. The execution engine is tez, and the TPC-DS scale factor is 3TB. 1. Run query 19 to query 24 sequentially in the sam