This is an automated email from the ASF dual-hosted git repository.
arawat pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git
The following commit(s) were added to refs/heads/master by this push:
new e142e48f3 IMPALA-11631 Fix impala crashes in
impala::TopNNode::Heap::Close()
e142e48f3 is described below
commit e142e48f30f223e514101a8318503063862f691b
Author: Yida Wu <[email protected]>
AuthorDate: Fri Sep 30 10:18:17 2022 -0700
IMPALA-11631 Fix impala crashes in impala::TopNNode::Heap::Close()
The bug is introduced by IMPALA-11631, if RematerializeTuples()
fails in ReclaimTuplePool(), it returns immediately with an error,
however, some Heap unique_ptr in the partition_heaps_ could be
already moved to the rematerialized_heaps, and set to nullptr,
while the Close() of the TopNNode will try to call Close() on all
the Heap unique_ptr in the partition_heaps_, which leads to a
crash.
There could be two issues, the calling on a nullptr as described
above and a potential memory leaking. Because Heap doesn't call
Close() in the destructor, the unique_ptr may not release all
the resources properly if we don't call the Close() explicitly
for the Heap.
The patch changes the logic of moving each Heap object after
one rematerialize process succeeds, instead, we will move all the
Heap objects in the partition_heaps_ only when all the rematerialize
processes succeed. Therefore, there will be no half released
partition_heaps_, all the Heap should be called Close() during the
TopNNode closing. Also, added checking for nullptr Heaps in the
Close() process of TopNNode.
Because it could be difficult for a testcase to inject an error
for this case to create a crash. I did some hacking in the
code to inject a memory allocation failure in certain cases,
reproduced the issue, and proved the patch can solve the issue
manually.
Tests:
Ran core tests.
Manual test.
Change-Id: Iaf45b6ef777f68e1843c076a935e4189acc6990b
Reviewed-on: http://gerrit.cloudera.org:8080/19087
Reviewed-by: Abhishek Rawat <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
---
be/src/exec/topn-node.cc | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/be/src/exec/topn-node.cc b/be/src/exec/topn-node.cc
index 2e331a6a8..72ceb885f 100644
--- a/be/src/exec/topn-node.cc
+++ b/be/src/exec/topn-node.cc
@@ -519,7 +519,8 @@ void TopNNode::Close(RuntimeState* state) {
if (is_closed()) return;
if (heap_ != nullptr) heap_->Close();
for (auto& entry : partition_heaps_) {
- entry.second->Close();
+ DCHECK(entry.second != nullptr);
+ if (entry.second != nullptr) entry.second->Close();
}
if (tuple_pool_.get() != nullptr) tuple_pool_->FreeAll();
if (order_cmp_.get() != nullptr) order_cmp_->Close(state);
@@ -691,6 +692,13 @@ Status TopNNode::ReclaimTuplePool(RuntimeState* state) {
for (auto& entry : partition_heaps_) {
RETURN_IF_ERROR(entry.second->RematerializeTuples(this, state,
temp_pool.get()));
DCHECK(entry.second->DCheckConsistency());
+ }
+ // The second loop is needed for IMPALA-11631. We only move heaps from
partition_heap_
+ // to rematerialized_heaps once all have been rematerialized. Otherwise,
in case of
+ // an error, we may call Close() on a nullptr or leak the memory by not
explicitly
+ // calling Close() on the heap pointer. Maybe better to add Close() in the
Heap
+ // destructor later.
+ for (auto& entry : partition_heaps_) {
// The key references memory in 'tuple_pool_'. Replace it with a
rematerialized
// tuple.
rematerialized_heaps.push_back(move(entry.second));