[ https://issues.apache.org/jira/browse/HIVE-11583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14744582#comment-14744582 ]
Hive QA commented on HIVE-11583: -------------------------------- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12755773/HIVE-11583.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9412 tests executed *Failed tests:* {noformat} TestParseNegative - did not produce a TEST-*.xml file org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5276/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5276/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5276/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12755773 - PreCommit-HIVE-TRUNK-Build > When PTF is used over a large partitions result could be corrupted > ------------------------------------------------------------------ > > Key: HIVE-11583 > URL: https://issues.apache.org/jira/browse/HIVE-11583 > Project: Hive > Issue Type: Bug > Components: PTF-Windowing > Affects Versions: 0.14.0, 0.13.1, 0.14.1, 1.0.0, 1.2.0, 1.2.1 > Environment: Hadoop 2.6 + Apache hive built from trunk > Reporter: Illya Yalovyy > Assignee: Illya Yalovyy > Priority: Critical > Attachments: HIVE-11583.patch > > > Dataset: > Window has 50001 record (2 blocks on disk and 1 block in memory) > Size of the second block is >32Mb (2 splits) > Result: > When the last block is read from the disk only first split is actually > loaded. The second split gets missed. The total count of the result dataset > is correct, but some records are missing and another are duplicated. > Example: > {code:sql} > CREATE TABLE ptf_big_src ( > id INT, > key STRING, > grp STRING, > value STRING > ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'; > LOAD DATA LOCAL INPATH '../../data/files/ptf_3blocks.txt.gz' OVERWRITE INTO > TABLE ptf_big_src; > SELECT grp, COUNT(1) cnt FROM ptf_big_trg GROUP BY grp ORDER BY cnt desc; > --- > -- A 25000 > -- B 20000 > -- C 5001 > --- > CREATE TABLE ptf_big_trg AS SELECT *, row_number() OVER (PARTITION BY key > ORDER BY grp) grp_num FROM ptf_big_src; > SELECT grp, COUNT(1) cnt FROM ptf_big_trg GROUP BY grp ORDER BY cnt desc; > -- > -- A 34296 > -- B 15704 > -- C 1 > --- > {code} > Counts by 'grp' are incorrect! -- This message was sent by Atlassian JIRA (v6.3.4#6332)