This is an automated email from the ASF dual-hosted git repository. stigahuang pushed a commit to branch branch-4.1.1 in repository https://gitbox.apache.org/repos/asf/impala.git
commit 6e404490ace4cccae70faefdc3ece4fae69abc86 Author: Yida Wu <[email protected]> AuthorDate: Fri Sep 30 10:18:17 2022 -0700 IMPALA-11631 Fix impala crashes in impala::TopNNode::Heap::Close() The bug is introduced by IMPALA-11631, if RematerializeTuples() fails in ReclaimTuplePool(), it returns immediately with an error, however, some Heap unique_ptr in the partition_heaps_ could be already moved to the rematerialized_heaps, and set to nullptr, while the Close() of the TopNNode will try to call Close() on all the Heap unique_ptr in the partition_heaps_, which leads to a crash. There could be two issues, the calling on a nullptr as described above and a potential memory leaking. Because Heap doesn't call Close() in the destructor, the unique_ptr may not release all the resources properly if we don't call the Close() explicitly for the Heap. The patch changes the logic of moving each Heap object after one rematerialize process succeeds, instead, we will move all the Heap objects in the partition_heaps_ only when all the rematerialize processes succeed. Therefore, there will be no half released partition_heaps_, all the Heap should be called Close() during the TopNNode closing. Also, added checking for nullptr Heaps in the Close() process of TopNNode. Because it could be difficult for a testcase to inject an error for this case to create a crash. I did some hacking in the code to inject a memory allocation failure in certain cases, reproduced the issue, and proved the patch can solve the issue manually. Tests: Ran core tests. Manual test. Change-Id: Iaf45b6ef777f68e1843c076a935e4189acc6990b Reviewed-on: http://gerrit.cloudera.org:8080/19087 Reviewed-by: Abhishek Rawat <[email protected]> Tested-by: Impala Public Jenkins <[email protected]> Reviewed-on: http://gerrit.cloudera.org:8080/19127 Reviewed-by: Yida Wu <[email protected]> Tested-by: Quanlong Huang <[email protected]> --- be/src/exec/topn-node.cc | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/be/src/exec/topn-node.cc b/be/src/exec/topn-node.cc index f65af7df3..7d2ff11b1 100644 --- a/be/src/exec/topn-node.cc +++ b/be/src/exec/topn-node.cc @@ -518,7 +518,8 @@ void TopNNode::Close(RuntimeState* state) { if (is_closed()) return; if (heap_ != nullptr) heap_->Close(); for (auto& entry : partition_heaps_) { - entry.second->Close(); + DCHECK(entry.second != nullptr); + if (entry.second != nullptr) entry.second->Close(); } if (tuple_pool_.get() != nullptr) tuple_pool_->FreeAll(); if (order_cmp_.get() != nullptr) order_cmp_->Close(state); @@ -690,6 +691,13 @@ Status TopNNode::ReclaimTuplePool(RuntimeState* state) { for (auto& entry : partition_heaps_) { RETURN_IF_ERROR(entry.second->RematerializeTuples(this, state, temp_pool.get())); DCHECK(entry.second->DCheckConsistency()); + } + // The second loop is needed for IMPALA-11631. We only move heaps from partition_heap_ + // to rematerialized_heaps once all have been rematerialized. Otherwise, in case of + // an error, we may call Close() on a nullptr or leak the memory by not explicitly + // calling Close() on the heap pointer. Maybe better to add Close() in the Heap + // destructor later. + for (auto& entry : partition_heaps_) { // The key references memory in 'tuple_pool_'. Replace it with a rematerialized // tuple. rematerialized_heaps.push_back(move(entry.second));
