Hello Gopal,
I have been looking further into this issue, and have found that the
non-determinstic behavior of Hive in
generating DAGs is actually due to the logic in
AggregateStatsCache.findBestMatch() called from
AggregateStatsCache.get(), as well as the disproportionate distribution of
Nulls in
> My conclusion is that a query can update some internal states of HiveServer2,
> affecting DAG generation for subsequent queries.
Other than the automatic reoptimization feature, there's two other potential
suspects.
First one would be to disable the in-memory stats cache's variance param, wh
Hello Sungwoo!
I think its possible that reoptimization is kicking in, because the first
execution have bumped into an exception.
I think the plans should not be changing permanently; unless
"hive.query.reexecution.stats.persist.scope" is set to a wider scope than query.
To check that indeed