[ 
https://issues.apache.org/jira/browse/IMPALA-14969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-14969:
------------------------------------
    Attachment: profile_scan_false_completion.txt

> False completion in HdfsScanNode when scanner failed to open due to 
> cancellation
> --------------------------------------------------------------------------------
>
>                 Key: IMPALA-14969
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14969
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>         Attachments: profile_scan_false_completion.txt
>
>
> False completion in HdfsScanNode when scanner failed to open due to 
> cancellation
> When using MT_DOP=0, HdfsScanNode spawns scanner threads to fill its RowBatch 
> queue asynchronously. The fragment instance thread invokes 
> HdfsScanNode::GetNext() to get RowBatches from the queue. If it gets nothing 
> (the queue is shutdown), eos (EndOfStream) is set to true 
> ([code|https://github.com/apache/impala/blob/739f9cd6a3939a3d2eaede88e6901429a2b0201c/be/src/exec/hdfs-scan-node.cc#L133-L145]),
>  which means the execution completes. In most of the completes, 
> HdfsScanNode::GetNext() returns an OK Status and sets eos=true. Setting 
> eos=true also adds a "Last Batch Returned" event in the Node Lifecycle Event 
> Timeline in profile.
> However, the RowBatch queue could also be shutdown due to scanner aborts 
> unexpectedly, e.g. when the fragment instance is cancelled.
> Fragment instance cancellation code path:
>  * ControlService::CancelQueryFInstances -> QueryState::Cancel -> 
> FragmentInstanceState::Cancel -> RuntimeState::Cancel -> set is_cancelled_ to 
> true. ScannerContext::cancelled() will return true due to this.
> Scanner thread cancellation code path example:
>  * HdfsScanNode::ScannerThread -> ProcessSplit -> CreateAndOpenScannerHelper 
> -> HdfsParquetScanner::Open -> ProcessFooter -> 
> ScannerContext::Stream::ReadBytes -> GetBytes -> GetBytesInternal -> 
> GetNextBuffer -> Found ScannerContext::cancelled() is true, returns 
> Status::CancelledInternal("ScannerContext").
>  * The non-OK Status returned from CreateAndOpenScannerHelper() is used in 
> HdfsScanNode::SetError -> SetDoneInternal -> thread_state_.Shutdown() 
> shutdown the RowBatch queue.
> In this case, HdfsScanNode::GetNext() returns a non-OK status but still set 
> eos=true. Then query profile shows "Last Batch Returned" in this node, which 
> is misleading especially if nothing has been returned from the node. It's 
> more confusing when the cancellation is triggered internally, e.g. when the 
> coordinator fragment reaches its limit, coordinator cancells all the fragment 
> instances. The query finishes successfully but some HdfsScanNodes have 
> RowsRead=0 and RowsReturned=0, seems like the table/partitions are empty or 
> files are skipped by runtime filters.
> profile_scan_false_completion.txt is an example. In the ExecSummary, ScanNode 
> on tpch_nested_parquet.customer just returns 6 rows.
> {noformat}
> Operator                    #Hosts  #Inst   Avg Time  Max Time  #Rows  Est. 
> #Rows  Peak Mem  Est. Peak Mem  Detail     
> -------------------------------------------------------------------------------------------------------------------------------------------
> F02:ROOT                         1      1   11.344us  11.344us                
>       4.01 MB        4.00 MB     
> 06:EXCHANGE                      1      1    5.603us   5.603us     10         
>  10  16.00 KB       16.00 KB  UNPARTITIONED     
> F01:EXCHANGE SENDER              3      3   34.231us  48.626us                
>      358.00 B       80.00 KB     
> 00:UNION                         3      3    0.000ns   0.000ns     10         
>  10  41.00 KB              0  CANCELLED     
> 02:SUBPLAN                       3      3    0.000ns   0.000ns      0       
> 1.50M  33.00 KB              0  CANCELLED     
> |--05:NESTED LOOP JOIN           3      3   57.657us  63.183us      0         
>  10  24.00 KB        20.00 B  CROSS JOIN, CANCELLED     
> |  |--03:SINGULAR ROW SRC        3      3    0.000ns   0.000ns      0         
>   1         0              0  CANCELLED     
> |  04:UNNEST                     3      3   10.494us  11.744us      0         
>  10         0              0  c.c_orders arr, CANCELLED     
> 01:SCAN HDFS                     3      3  785.984us   1.113ms      6     
> 150.00K  16.01 MB      176.00 MB  tpch_nested_parquet.customer c{noformat}
> Here is one of the misleading node profile:
> {noformat}
>         HDFS_SCAN_NODE (id=1):(Total: 1.113ms, non-child: 1.113ms, % 
> non-child: 100.00%)
>           Table Name: tpch_nested_parquet.customer
>           Hdfs split stats (<volume id>:<# splits>/<split lengths>): 
> 0:2/192.55 MB
>           Hdfs Read Thread Concurrency Bucket: 0:0% 1:0% 2:0% 3:0% 4:0% 5:0% 
> 6:0% 7:0% 8:0% 9:0% 10:0% 11:0% 12:0% 13:0% 14:0% 15:0% 16:0% 17:0% 18:0% 
> 19:0% 20:0% 21:0% 22:0% 23:0% 24:0% 25:0% 26:0% 27:0% 28:0% 29:0% 30:0% 31:0% 
> 32:0%
>           File Formats: PARQUET/Unknown(Skipped):2
>           ExecOption: Codegen enabled: 2 out of 2, PARQUET Codegen Enabled
>           Node Lifecycle Event Timeline: 19.806ms
>              - Open Started: 18.758ms (18.758ms)
>              - Open Finished: 18.780ms (22.875us)
>              - First Batch Requested: 18.788ms (7.723us)
>              - First Batch Returned: 19.790ms (1.001ms)
>              - Last Batch Returned: 19.790ms (69.000ns)
>              - Closed: 19.806ms (16.062us)
>            ...
>            - RowsRead: 0 (0)
>            - RowsReturned: 0 (0){noformat}
> Though it shows "Last Batch Returned", the RowsRead and RowsReturned are all 
> 0. "File Formats: PARQUET/Unknown(Skipped):2" can explain a little bit, but 
> there are no reason to explain why the files are skipped. Note that no 
> runtime filters in this case.
> It makes more sense to remove the "Last Batch Returned" event, which also 
> helps HBO to skip tracking the cardinality of the scan node (IMPALA-14843).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to