Csaba Ringhofer created IMPALA-13475:
----------------------------------------
Summary: Consider byte size when enqueuing deferred RPCs in
KrpcDataStreamRecvr
Key: IMPALA-13475
URL: https://issues.apache.org/jira/browse/IMPALA-13475
Project: IMPALA
Issue Type: Improvement
Components: Backend
Reporter: Csaba Ringhofer
KrpcDataStreamRecvr::SenderQueue::ProcessDeferredRpc() can fail to process the
deferred RCP if batch_queue is not empty and the batch queue + the currently
processed batch would consume too much memory (see
KrpcDataStreamRecvr::CanEnqueue for details). The deferred RPC is moved back to
the queue in this case.
Meanwhile KrpcDataStreamRecvr::SenderQueue::GetBatch() doesn't consider the mem
requirement of the batches when initiating the deserialization of deferred RCPs
( EnqueueDeserializeTask) and tries to deserialize as much batches in parallel
as possible (FLAGS_datastream_service_num_deserialization_threads,
https://github.com/apache/impala/blob/c83e5d97693fd3035b33622512d1584a5e56ce8b/be/src/runtime/krpc-data-stream-recvr.cc#L281).
This means that several threads may start ProcessDeferredRpc() even if
GetBatch() could have predicted that most will fail due to the memory limit.
While this ProcessDeferredRpc() will fail early in this case and won't do much
work, these extra failed attempts lock contention worse in
KrpcDataStreamRecvr::SenderQueue. In the worst case when only 1 batch fits to
memory this can lead to O(FLAGS_datastream_service_num_deserialization_threads
* num_batches) wasted ProcessDeferredRpc attempts.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]