[jira] [Commented] (HIVE-19418) add background stats updater similar to compactor

Hive QA (JIRA) Thu, 17 May 2018 01:15:13 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-19418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478717#comment-16478717
 ]


Hive QA commented on HIVE-19418:
--------------------------------



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12923843/HIVE-19418.02.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 14416 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_stats]
 (batchId=159)
org.apache.hadoop.hive.metastore.client.TestTablesCreateDropAlterTruncate.testCreateTableNullDatabase[Embedded]
 (batchId=209)
org.apache.hadoop.hive.metastore.client.TestTablesCreateDropAlterTruncate.testCreateTableNullDatabase[Remote]
 (batchId=209)
org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testMapWithComplexData[5]
 (batchId=196)
org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testWriteDate2 (batchId=196)
org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testWriteTinyint 
(batchId=196)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/11021/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/11021/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-11021/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12923843 - PreCommit-HIVE-Build

> add background stats updater similar to compactor
> -------------------------------------------------
>
>                 Key: HIVE-19418
>                 URL: https://issues.apache.org/jira/browse/HIVE-19418
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>            Priority: Major
>         Attachments: HIVE-19418.01.patch, HIVE-19418.02.patch, 
> HIVE-19418.patch
>
>
> There's a JIRA HIVE-19416 to add snapshot version to stats for MM/ACID tables 
> to make them usable in a transaction without breaking ACID (for metadata-only 
> optimization). However, stats for ACID tables can still become unusable if 
> e.g. two parallel inserts run - neither sees the data written by the other, 
> so after both finish, the snapshots on either set of stats won't match the 
> current snapshot and the stats will be unusable.
> Additionally, for ACID and non-ACID tables alike, a lot of the stats, with 
> some exceptions like numRows, cannot be aggregated (i.e. you cannot combine 
> ndvs from two inserts), and for ACID even less can be aggregated (you cannot 
> derive min/max if some rows are deleted but you don't scan the rest of the 
> dataset).
> Therefore we will add background logic to metastore (similar to, and 
> partially inside, the ACID compactor) to update stats.
> It will have 3 modes of operation.
> 1) Off.
> 2) Update only the stats that exist but are out of date (generating stats can 
> be expensive, so if the user is only analyzing a subset of tables it should 
> be able to only update that subset). We can simply look at existing stats and 
> only analyze for the relevant partitions and columns.
> 3) On: 2 + create stats for all tables and columns missing stats.
> There will also be a table parameter to skip stats update. 
> In phase 1, the process will operate outside of compactor, and run analyze 
> command on the table. The analyze command will automatically save the stats 
> with ACID snapshot information if needed, based on HIVE-19416, so we don't 
> need to do any special state management and this will work for all table 
> types. However it's also more expensive.
> In phase 2, we can explore adding stats collection during MM compaction that 
> uses a temp table. If we don't have open writers during major compaction (so 
> we overwrite all of the data), the temp table stats can simply be copied over 
> to the main table with correct snapshot information, saving us a table scan.
> In phase 3, we can add custom stats collection logic to full ACID compactor 
> that is not query based, the same way as we'd do for (2). Alternatively we 
> can wait for ACID compactor to become query based and just reuse (2).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-19418) add background stats updater similar to compactor

Reply via email to