[jira] [Commented] (HIVE-11583) When PTF is used over a large partitions result could be corrupted

Hive QA (JIRA) Mon, 14 Sep 2015 17:42:45 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-11583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14744582#comment-14744582
 ]


Hive QA commented on HIVE-11583:
--------------------------------



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12755773/HIVE-11583.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9412 tests executed
*Failed tests:*
{noformat}
TestParseNegative - did not produce a TEST-*.xml file
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5276/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5276/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5276/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12755773 - PreCommit-HIVE-TRUNK-Build

> When PTF is used over a large partitions result could be corrupted
> ------------------------------------------------------------------
>
>                 Key: HIVE-11583
>                 URL: https://issues.apache.org/jira/browse/HIVE-11583
>             Project: Hive
>          Issue Type: Bug
>          Components: PTF-Windowing
>    Affects Versions: 0.14.0, 0.13.1, 0.14.1, 1.0.0, 1.2.0, 1.2.1
>         Environment: Hadoop 2.6 + Apache hive built from trunk
>            Reporter: Illya Yalovyy
>            Assignee: Illya Yalovyy
>            Priority: Critical
>         Attachments: HIVE-11583.patch
>
>
> Dataset: 
>  Window has 50001 record (2 blocks on disk and 1 block in memory)
>  Size of the second block is >32Mb (2 splits)
> Result:
> When the last block is read from the disk only first split is actually 
> loaded. The second split gets missed. The total count of the result dataset 
> is correct, but some records are missing and another are duplicated.
> Example:
> {code:sql}
> CREATE TABLE ptf_big_src (
>   id INT,
>   key STRING,
>   grp STRING,
>   value STRING
> ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
> LOAD DATA LOCAL INPATH '../../data/files/ptf_3blocks.txt.gz' OVERWRITE INTO 
> TABLE ptf_big_src;
> SELECT grp, COUNT(1) cnt FROM ptf_big_trg GROUP BY grp ORDER BY cnt desc;
> ---
> -- A  25000
> -- B  20000
> -- C  5001
> ---
> CREATE TABLE ptf_big_trg AS SELECT *, row_number() OVER (PARTITION BY key 
> ORDER BY grp) grp_num FROM ptf_big_src;
> SELECT grp, COUNT(1) cnt FROM ptf_big_trg GROUP BY grp ORDER BY cnt desc;
> -- 
> -- A  34296
> -- B  15704
> -- C  1
> ---
> {code}
> Counts by 'grp' are incorrect!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11583) When PTF is used over a large partitions result could be corrupted

Reply via email to