Re: Hive generating different DAGs from the same query

2018-09-11 Thread Sungwoo Park
Hello Gopal, I have been looking further into this issue, and have found that the non-determinstic behavior of Hive in generating DAGs is actually due to the logic in AggregateStatsCache.findBestMatch() called from AggregateStatsCache.get(), as well as the disproportionate distribution of Nulls in

Re: Hive generating different DAGs from the same query

2018-07-19 Thread Gopal Vijayaraghavan
> My conclusion is that a query can update some internal states of HiveServer2, > affecting DAG generation for subsequent queries. Other than the automatic reoptimization feature, there's two other potential suspects. First one would be to disable the in-memory stats cache's variance param, wh

Re: Hive generating different DAGs from the same query

2018-07-13 Thread Zoltan Haindrich
Hello Sungwoo! I think its possible that reoptimization is kicking in, because the first execution have bumped into an exception. I think the plans should not be changing permanently; unless "hive.query.reexecution.stats.persist.scope" is set to a wider scope than query. To check that indeed