[ https://issues.apache.org/jira/browse/HIVE-27327?focusedWorklogId=861210&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-861210 ]
ASF GitHub Bot logged work on HIVE-27327: ----------------------------------------- Author: ASF GitHub Bot Created on: 09/May/23 14:27 Start Date: 09/May/23 14:27 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4301: URL: https://github.com/apache/hive/pull/4301#discussion_r1188693447 ########## iceberg/iceberg-handler/src/test/results/positive/vectorized_iceberg_merge_mixed.q.out: ########## @@ -400,21 +345,21 @@ STAGE PLANS: native: true nativeConditionsMet: hive.mapjoin.optimized.hashtable IS true, hive.vectorized.execution.mapjoin.native.enabled IS true, hive.execution.engine tez IN [tez] IS true, One MapJoin Condition IS true, No nullsafe IS true, Small table vectorizes IS true, Outer Join has keys IS true, Optimized Table and Supports Key Types IS true outerSmallTableKeyMapping: 2 -> 39, 3 -> 40 - projectedOutput: 33:int, 34:bigint, 35:string, 36:bigint, 37:int, 38:int, 39:int, 40:int, 41:int, 42:int, 43:int, 44:int, 45:int, 46:int, 47:int, 48:decimal(7,2), 49:decimal(7,2), 50:decimal(7,2), 51:decimal(7,2), 52:decimal(7,2), 53:decimal(7,2), 54:decimal(7,2), 55:decimal(7,2), 56:decimal(7,2), 57:decimal(7,2), 58:decimal(7,2), 59:decimal(7,2), 1:int, 2:int, 3:int, 4:int, 5:int, 6:int, 7:int, 8:int, 9:int, 10:int, 11:decimal(7,2), 12:decimal(7,2), 13:decimal(7,2), 14:decimal(7,2), 15:decimal(7,2), 16:decimal(7,2), 17:decimal(7,2), 18:decimal(7,2), 19:decimal(7,2), 20:decimal(7,2), 21:decimal(7,2), 22:decimal(7,2) + projectedOutput: 1:int, 2:int, 3:int, 4:int, 5:int, 6:int, 7:int, 8:int, 9:int, 10:int, 11:decimal(7,2), 12:decimal(7,2), 13:decimal(7,2), 14:decimal(7,2), 15:decimal(7,2), 16:decimal(7,2), 17:decimal(7,2), 18:decimal(7,2), 19:decimal(7,2), 20:decimal(7,2), 21:decimal(7,2), 22:decimal(7,2), 33:int, 34:bigint, 35:string, 36:bigint, 37:int, 38:int, 39:int, 40:int, 41:int, 42:int, 43:int, 44:int, 45:int, 46:int, 47:int, 48:decimal(7,2), 49:decimal(7,2), 50:decimal(7,2), 51:decimal(7,2), 52:decimal(7,2), 53:decimal(7,2), 54:decimal(7,2), 55:decimal(7,2), 56:decimal(7,2), 57:decimal(7,2), 58:decimal(7,2), 59:decimal(7,2) smallTableValueMapping: 33:int, 34:bigint, 35:string, 36:bigint, 37:int, 38:int, 41:int, 42:int, 43:int, 44:int, 45:int, 46:int, 47:int, 48:decimal(7,2), 49:decimal(7,2), 50:decimal(7,2), 51:decimal(7,2), 52:decimal(7,2), 53:decimal(7,2), 54:decimal(7,2), 55:decimal(7,2), 56:decimal(7,2), 57:decimal(7,2), 58:decimal(7,2), 59:decimal(7,2) hashTableImplementationType: OPTIMIZED - outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28, _col29, _col30, _col31, _col32, _col33, _col34, _col35, _col36, _col37, _col38, _col39, _col40, _col41, _col42, _col43, _col44, _col45, _col46, _col47, _col48 + outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col24, _col25, _col26, _col27, _col28, _col29, _col30, _col31, _col32, _col33, _col34, _col35, _col36, _col37, _col38, _col39, _col40, _col41, _col42, _col43, _col44, _col45, _col46, _col47, _col48, _col49, _col50 input vertices: - 0 Map 1 - Statistics: Num rows: 5 Data size: 4320 Basic stats: COMPLETE Column stats: COMPLETE + 1 Map 6 + Statistics: Num rows: 7 Data size: 6656 Basic stats: COMPLETE Column stats: COMPLETE Review Comment: was the row count inaccurate before? Issue Time Tracking ------------------- Worklog Id: (was: 861210) Time Spent: 2h (was: 1h 50m) > Iceberg basic stats: Incorrect row count in snapshot summary leading to > unoptimized plans > ----------------------------------------------------------------------------------------- > > Key: HIVE-27327 > URL: https://issues.apache.org/jira/browse/HIVE-27327 > Project: Hive > Issue Type: Bug > Reporter: Simhadri Govindappa > Assignee: Simhadri Govindappa > Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > > In the absence of equality deletes, the total row count should be : > {noformat} > row_count = total-records - total-position-deletes{noformat} > > > Example: > After many inserts and deletes, there are only 46 records in a table. > {noformat} > >>select count(*) from llap_orders; > +------+ > | _c0 | > +------+ > | 46 | > +------+ > 1 row selected (7.22 seconds) > {noformat} > > But the total records in snapshot summary indicate that there are 300 records > > {noformat} > { > "sequence-number" : 19, > "snapshot-id" : 4237525869561629328, > "parent-snapshot-id" : 2572487769557272977, > "timestamp-ms" : 1683553017982, > "summary" : { > "operation" : "append", > "added-data-files" : "5", > "added-records" : "12", > "added-files-size" : "3613", > "changed-partition-count" : "5", > "total-records" : "300", > "total-files-size" : "164405", > "total-data-files" : "100", > "total-delete-files" : "73", > "total-position-deletes" : "254", > "total-equality-deletes" : "0" > }{noformat} > > As a result of this , the hive plans generated are unoptimized. > {noformat} > 0: jdbc:hive2://simhadrigovindappa-2.simhadri> explain update llap_orders set > itemid=7 where itemid=5; > INFO : OK > +----------------------------------------------------+ > | Explain | > +----------------------------------------------------+ > | Vertex dependency in root stage | > | Reducer 2 <- Map 1 (SIMPLE_EDGE) | > | Reducer 3 <- Map 1 (SIMPLE_EDGE) | > | | > | Stage-4 | > | Stats Work{} | > | Stage-0 | > | Move Operator | > | table:{"name:":"db.llap_orders"} | > | Stage-3 | > | Dependency Collection{} | > | Stage-2 | > | Reducer 2 vectorized | > | File Output Operator [FS_14] | > | table:{"name:":"db.llap_orders"} | > | Select Operator [SEL_13] (rows=150 width=424) | > | > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9"] > | > | <-Map 1 [SIMPLE_EDGE] | > | SHUFFLE [RS_4] | > | Select Operator [SEL_3] (rows=150 width=424) | > | > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col7","_col8","_col9"] > | > | Select Operator [SEL_2] (rows=150 width=644) | > | > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col7","_col8","_col9","_col10","_col11","_col13","_col14","_col15"] > | > | Filter Operator [FIL_9] (rows=150 width=220) | > | predicate:(itemid = 5) | > | TableScan [TS_0] (rows=300 width=220) | > | > db@llap_orders,llap_orders,Tbl:COMPLETE,Col:COMPLETE,Output:["orderid","quantity","itemid","tradets","p1","p2"] > | > | Reducer 3 vectorized | > | File Output Operator [FS_16] | > | table:{"name:":"db.llap_orders"} | > | Select Operator [SEL_15] | > | > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col4","_col5"] | > | <-Map 1 [SIMPLE_EDGE] | > | SHUFFLE [RS_10] | > | PartitionCols:_col4, _col5 | > | Select Operator [SEL_7] (rows=150 width=220) | > | > Output:["_col0","_col1","_col2","_col3","_col4","_col5"] | > | Please refer to the previous Select Operator [SEL_2] > | > | | > +----------------------------------------------------+ > 39 rows selected (0.104 seconds) > 0: jdbc:hive2://simhadrigovindappa-2.simhadri>{noformat} > > -- This message was sent by Atlassian Jira (v8.20.10#820010)