[
https://issues.apache.org/jira/browse/IMPALA-14969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Quanlong Huang updated IMPALA-14969:
------------------------------------
Attachment: profile_scan_false_completion.txt
> False completion in HdfsScanNode when scanner failed to open due to
> cancellation
> --------------------------------------------------------------------------------
>
> Key: IMPALA-14969
> URL: https://issues.apache.org/jira/browse/IMPALA-14969
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Reporter: Quanlong Huang
> Assignee: Quanlong Huang
> Priority: Critical
> Attachments: profile_scan_false_completion.txt
>
>
> False completion in HdfsScanNode when scanner failed to open due to
> cancellation
> When using MT_DOP=0, HdfsScanNode spawns scanner threads to fill its RowBatch
> queue asynchronously. The fragment instance thread invokes
> HdfsScanNode::GetNext() to get RowBatches from the queue. If it gets nothing
> (the queue is shutdown), eos (EndOfStream) is set to true
> ([code|https://github.com/apache/impala/blob/739f9cd6a3939a3d2eaede88e6901429a2b0201c/be/src/exec/hdfs-scan-node.cc#L133-L145]),
> which means the execution completes. In most of the completes,
> HdfsScanNode::GetNext() returns an OK Status and sets eos=true. Setting
> eos=true also adds a "Last Batch Returned" event in the Node Lifecycle Event
> Timeline in profile.
> However, the RowBatch queue could also be shutdown due to scanner aborts
> unexpectedly, e.g. when the fragment instance is cancelled.
> Fragment instance cancellation code path:
> * ControlService::CancelQueryFInstances -> QueryState::Cancel ->
> FragmentInstanceState::Cancel -> RuntimeState::Cancel -> set is_cancelled_ to
> true. ScannerContext::cancelled() will return true due to this.
> Scanner thread cancellation code path example:
> * HdfsScanNode::ScannerThread -> ProcessSplit -> CreateAndOpenScannerHelper
> -> HdfsParquetScanner::Open -> ProcessFooter ->
> ScannerContext::Stream::ReadBytes -> GetBytes -> GetBytesInternal ->
> GetNextBuffer -> Found ScannerContext::cancelled() is true, returns
> Status::CancelledInternal("ScannerContext").
> * The non-OK Status returned from CreateAndOpenScannerHelper() is used in
> HdfsScanNode::SetError -> SetDoneInternal -> thread_state_.Shutdown()
> shutdown the RowBatch queue.
> In this case, HdfsScanNode::GetNext() returns a non-OK status but still set
> eos=true. Then query profile shows "Last Batch Returned" in this node, which
> is misleading especially if nothing has been returned from the node. It's
> more confusing when the cancellation is triggered internally, e.g. when the
> coordinator fragment reaches its limit, coordinator cancells all the fragment
> instances. The query finishes successfully but some HdfsScanNodes have
> RowsRead=0 and RowsReturned=0, seems like the table/partitions are empty or
> files are skipped by runtime filters.
> profile_scan_false_completion.txt is an example. In the ExecSummary, ScanNode
> on tpch_nested_parquet.customer just returns 6 rows.
> {noformat}
> Operator #Hosts #Inst Avg Time Max Time #Rows Est.
> #Rows Peak Mem Est. Peak Mem Detail
> -------------------------------------------------------------------------------------------------------------------------------------------
> F02:ROOT 1 1 11.344us 11.344us
> 4.01 MB 4.00 MB
> 06:EXCHANGE 1 1 5.603us 5.603us 10
> 10 16.00 KB 16.00 KB UNPARTITIONED
> F01:EXCHANGE SENDER 3 3 34.231us 48.626us
> 358.00 B 80.00 KB
> 00:UNION 3 3 0.000ns 0.000ns 10
> 10 41.00 KB 0 CANCELLED
> 02:SUBPLAN 3 3 0.000ns 0.000ns 0
> 1.50M 33.00 KB 0 CANCELLED
> |--05:NESTED LOOP JOIN 3 3 57.657us 63.183us 0
> 10 24.00 KB 20.00 B CROSS JOIN, CANCELLED
> | |--03:SINGULAR ROW SRC 3 3 0.000ns 0.000ns 0
> 1 0 0 CANCELLED
> | 04:UNNEST 3 3 10.494us 11.744us 0
> 10 0 0 c.c_orders arr, CANCELLED
> 01:SCAN HDFS 3 3 785.984us 1.113ms 6
> 150.00K 16.01 MB 176.00 MB tpch_nested_parquet.customer c{noformat}
> Here is one of the misleading node profile:
> {noformat}
> HDFS_SCAN_NODE (id=1):(Total: 1.113ms, non-child: 1.113ms, %
> non-child: 100.00%)
> Table Name: tpch_nested_parquet.customer
> Hdfs split stats (<volume id>:<# splits>/<split lengths>):
> 0:2/192.55 MB
> Hdfs Read Thread Concurrency Bucket: 0:0% 1:0% 2:0% 3:0% 4:0% 5:0%
> 6:0% 7:0% 8:0% 9:0% 10:0% 11:0% 12:0% 13:0% 14:0% 15:0% 16:0% 17:0% 18:0%
> 19:0% 20:0% 21:0% 22:0% 23:0% 24:0% 25:0% 26:0% 27:0% 28:0% 29:0% 30:0% 31:0%
> 32:0%
> File Formats: PARQUET/Unknown(Skipped):2
> ExecOption: Codegen enabled: 2 out of 2, PARQUET Codegen Enabled
> Node Lifecycle Event Timeline: 19.806ms
> - Open Started: 18.758ms (18.758ms)
> - Open Finished: 18.780ms (22.875us)
> - First Batch Requested: 18.788ms (7.723us)
> - First Batch Returned: 19.790ms (1.001ms)
> - Last Batch Returned: 19.790ms (69.000ns)
> - Closed: 19.806ms (16.062us)
> ...
> - RowsRead: 0 (0)
> - RowsReturned: 0 (0){noformat}
> Though it shows "Last Batch Returned", the RowsRead and RowsReturned are all
> 0. "File Formats: PARQUET/Unknown(Skipped):2" can explain a little bit, but
> there are no reason to explain why the files are skipped. Note that no
> runtime filters in this case.
> It makes more sense to remove the "Last Batch Returned" event, which also
> helps HBO to skip tracking the cardinality of the scan node (IMPALA-14843).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]