[ https://issues.apache.org/jira/browse/HIVE-27327?focusedWorklogId=861652&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-861652 ]
ASF GitHub Bot logged work on HIVE-27327: ----------------------------------------- Author: ASF GitHub Bot Created on: 12/May/23 01:21 Start Date: 12/May/23 01:21 Worklog Time Spent: 10m Work Description: sonarcloud[bot] commented on PR #4301: URL: https://github.com/apache/hive/pull/4301#issuecomment-1544970047 Kudos, SonarCloud Quality Gate passed! [data:image/s3,"s3://crabby-images/2656e/2656e8541b763b68090c363cb0517d50870d6949" alt="Quality Gate passed"](https://sonarcloud.io/dashboard?id=apache_hive&pullRequest=4301) [data:image/s3,"s3://crabby-images/6135e/6135eaf14a9619548d22785897917e7aad9a8da9" alt="Bug"](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4301&resolved=false&types=BUG) [data:image/s3,"s3://crabby-images/453f0/453f08d95cf205db86567f1559524a0e46ca5459" alt="A"](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4301&resolved=false&types=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4301&resolved=false&types=BUG) [data:image/s3,"s3://crabby-images/98de8/98de802f9604b4c5c9f4562f6e72d4cc6e31b1fc" alt="Vulnerability"](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4301&resolved=false&types=VULNERABILITY) [data:image/s3,"s3://crabby-images/453f0/453f08d95cf205db86567f1559524a0e46ca5459" alt="A"](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4301&resolved=false&types=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4301&resolved=false&types=VULNERABILITY) [data:image/s3,"s3://crabby-images/7f09e/7f09eb112485357cbe497e6e477a3e3933a8f429" alt="Security Hotspot"](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4301&resolved=false&types=SECURITY_HOTSPOT) [data:image/s3,"s3://crabby-images/453f0/453f08d95cf205db86567f1559524a0e46ca5459" alt="A"](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4301&resolved=false&types=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4301&resolved=false&types=SECURITY_HOTSPOT) [data:image/s3,"s3://crabby-images/d1308/d1308778d7fb4fd3fe0feebb451e12d9f34962f9" alt="Code Smell"](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4301&resolved=false&types=CODE_SMELL) [data:image/s3,"s3://crabby-images/453f0/453f08d95cf205db86567f1559524a0e46ca5459" alt="A"](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4301&resolved=false&types=CODE_SMELL) [1 Code Smell](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4301&resolved=false&types=CODE_SMELL) [data:image/s3,"s3://crabby-images/ea088/ea08897cecd06f856143a3f4aac100b79553a0bb" alt="No Coverage information"](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4301&metric=coverage&view=list) No Coverage information [data:image/s3,"s3://crabby-images/099e0/099e06656fcb9f2920f5c631abf58323c349d711" alt="No Duplication information"](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4301&metric=duplicated_lines_density&view=list) No Duplication information Issue Time Tracking ------------------- Worklog Id: (was: 861652) Time Spent: 3.5h (was: 3h 20m) > Iceberg basic stats: Incorrect row count in snapshot summary leading to > unoptimized plans > ----------------------------------------------------------------------------------------- > > Key: HIVE-27327 > URL: https://issues.apache.org/jira/browse/HIVE-27327 > Project: Hive > Issue Type: Bug > Reporter: Simhadri Govindappa > Assignee: Simhadri Govindappa > Priority: Major > Labels: pull-request-available > Time Spent: 3.5h > Remaining Estimate: 0h > > In the absence of equality deletes, the total row count should be : > {noformat} > row_count = total-records - total-position-deletes{noformat} > > > Example: > After many inserts and deletes, there are only 46 records in a table. > {noformat} > >>select count(*) from llap_orders; > +------+ > | _c0 | > +------+ > | 46 | > +------+ > 1 row selected (7.22 seconds) > {noformat} > > But the total records in snapshot summary indicate that there are 300 records > > {noformat} > { > "sequence-number" : 19, > "snapshot-id" : 4237525869561629328, > "parent-snapshot-id" : 2572487769557272977, > "timestamp-ms" : 1683553017982, > "summary" : { > "operation" : "append", > "added-data-files" : "5", > "added-records" : "12", > "added-files-size" : "3613", > "changed-partition-count" : "5", > "total-records" : "300", > "total-files-size" : "164405", > "total-data-files" : "100", > "total-delete-files" : "73", > "total-position-deletes" : "254", > "total-equality-deletes" : "0" > }{noformat} > > As a result of this , the hive plans generated are unoptimized. > {noformat} > 0: jdbc:hive2://simhadrigovindappa-2.simhadri> explain update llap_orders set > itemid=7 where itemid=5; > INFO : OK > +----------------------------------------------------+ > | Explain | > +----------------------------------------------------+ > | Vertex dependency in root stage | > | Reducer 2 <- Map 1 (SIMPLE_EDGE) | > | Reducer 3 <- Map 1 (SIMPLE_EDGE) | > | | > | Stage-4 | > | Stats Work{} | > | Stage-0 | > | Move Operator | > | table:{"name:":"db.llap_orders"} | > | Stage-3 | > | Dependency Collection{} | > | Stage-2 | > | Reducer 2 vectorized | > | File Output Operator [FS_14] | > | table:{"name:":"db.llap_orders"} | > | Select Operator [SEL_13] (rows=150 width=424) | > | > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9"] > | > | <-Map 1 [SIMPLE_EDGE] | > | SHUFFLE [RS_4] | > | Select Operator [SEL_3] (rows=150 width=424) | > | > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col7","_col8","_col9"] > | > | Select Operator [SEL_2] (rows=150 width=644) | > | > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col7","_col8","_col9","_col10","_col11","_col13","_col14","_col15"] > | > | Filter Operator [FIL_9] (rows=150 width=220) | > | predicate:(itemid = 5) | > | TableScan [TS_0] (rows=300 width=220) | > | > db@llap_orders,llap_orders,Tbl:COMPLETE,Col:COMPLETE,Output:["orderid","quantity","itemid","tradets","p1","p2"] > | > | Reducer 3 vectorized | > | File Output Operator [FS_16] | > | table:{"name:":"db.llap_orders"} | > | Select Operator [SEL_15] | > | > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col4","_col5"] | > | <-Map 1 [SIMPLE_EDGE] | > | SHUFFLE [RS_10] | > | PartitionCols:_col4, _col5 | > | Select Operator [SEL_7] (rows=150 width=220) | > | > Output:["_col0","_col1","_col2","_col3","_col4","_col5"] | > | Please refer to the previous Select Operator [SEL_2] > | > | | > +----------------------------------------------------+ > 39 rows selected (0.104 seconds) > 0: jdbc:hive2://simhadrigovindappa-2.simhadri>{noformat} > > -- This message was sent by Atlassian Jira (v8.20.10#820010)