[ https://issues.apache.org/jira/browse/HIVE-19418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478717#comment-16478717 ]
Hive QA commented on HIVE-19418: -------------------------------- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12923843/HIVE-19418.02.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 14416 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_stats] (batchId=159) org.apache.hadoop.hive.metastore.client.TestTablesCreateDropAlterTruncate.testCreateTableNullDatabase[Embedded] (batchId=209) org.apache.hadoop.hive.metastore.client.TestTablesCreateDropAlterTruncate.testCreateTableNullDatabase[Remote] (batchId=209) org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testMapWithComplexData[5] (batchId=196) org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testWriteDate2 (batchId=196) org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testWriteTinyint (batchId=196) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/11021/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/11021/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-11021/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12923843 - PreCommit-HIVE-Build > add background stats updater similar to compactor > ------------------------------------------------- > > Key: HIVE-19418 > URL: https://issues.apache.org/jira/browse/HIVE-19418 > Project: Hive > Issue Type: Bug > Components: Transactions > Reporter: Sergey Shelukhin > Assignee: Sergey Shelukhin > Priority: Major > Attachments: HIVE-19418.01.patch, HIVE-19418.02.patch, > HIVE-19418.patch > > > There's a JIRA HIVE-19416 to add snapshot version to stats for MM/ACID tables > to make them usable in a transaction without breaking ACID (for metadata-only > optimization). However, stats for ACID tables can still become unusable if > e.g. two parallel inserts run - neither sees the data written by the other, > so after both finish, the snapshots on either set of stats won't match the > current snapshot and the stats will be unusable. > Additionally, for ACID and non-ACID tables alike, a lot of the stats, with > some exceptions like numRows, cannot be aggregated (i.e. you cannot combine > ndvs from two inserts), and for ACID even less can be aggregated (you cannot > derive min/max if some rows are deleted but you don't scan the rest of the > dataset). > Therefore we will add background logic to metastore (similar to, and > partially inside, the ACID compactor) to update stats. > It will have 3 modes of operation. > 1) Off. > 2) Update only the stats that exist but are out of date (generating stats can > be expensive, so if the user is only analyzing a subset of tables it should > be able to only update that subset). We can simply look at existing stats and > only analyze for the relevant partitions and columns. > 3) On: 2 + create stats for all tables and columns missing stats. > There will also be a table parameter to skip stats update. > In phase 1, the process will operate outside of compactor, and run analyze > command on the table. The analyze command will automatically save the stats > with ACID snapshot information if needed, based on HIVE-19416, so we don't > need to do any special state management and this will work for all table > types. However it's also more expensive. > In phase 2, we can explore adding stats collection during MM compaction that > uses a temp table. If we don't have open writers during major compaction (so > we overwrite all of the data), the temp table stats can simply be copied over > to the main table with correct snapshot information, saving us a table scan. > In phase 3, we can add custom stats collection logic to full ACID compactor > that is not query based, the same way as we'd do for (2). Alternatively we > can wait for ACID compactor to become query based and just reuse (2). -- This message was sent by Atlassian JIRA (v7.6.3#76005)