[jira] [Commented] (HIVE-19671) Distribute by rand() can lead to data inconsistency

Rui Li (JIRA) Thu, 21 Jun 2018 06:20:03 -0700


    [ 
https://issues.apache.org/jira/browse/HIVE-19671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16519357#comment-16519357
 ]


Rui Li commented on HIVE-19671:
-------------------------------

[~xuefuz], I agree it's not trivial to solve this on Hive side. Maybe we can at 
least print some warning if the query has nondeterministic partitioning?
And another potential solution is to retry all downstream tasks when any 
upstream task fails, which needs help from the execution engine.

> Distribute by rand() can lead to data inconsistency
> ---------------------------------------------------
>
>                 Key: HIVE-19671
>                 URL: https://issues.apache.org/jira/browse/HIVE-19671
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Rui Li
>            Assignee: Rui Li
>            Priority: Major
>
> Noticed the following queries can give different results:
> {code}
> select count(*) from tbl;
> select count(*) from (select * from tbl distribute by rand()) a;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-19671) Distribute by rand() can lead to data inconsistency

Reply via email to