Kevin Wilfong created HIVE-3915:
-----------------------------------
Summary: Union with map-only query on one side and two MR job
query on the other produces wrong results
Key: HIVE-3915
URL: https://issues.apache.org/jira/browse/HIVE-3915
Project: Hive
Issue Type: Bug
Components: Query Processor
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
When a query contains a union with a map only subquery on one side and a
subquery involving two sequential map reduce jobs on the other, it can produce
wrong results. It appears that if the map only queries table scan operator is
processed first the task involving a union is made a root task. Then when the
other subquery is processed, the second map reduce job gains the task involving
the union as a child and it is made a root task. This means that both the
first and second map reduce jobs are root tasks, so the dependency between the
two is ignored. If they are run in parallel (i.e. the cluster has more than
one node) no results will be produced for the side of the union with the two
map reduce jobs and only the results of the other side of the union will be
returned.
The order TableScan operators are processed is crucial to reproducing this bug,
and it is determined by the order values are retrieved from a map, and hence
hard to predict, so it doesn't always reproduce.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira