This is an automated email from the ASF dual-hosted git repository.

stigahuang pushed a commit to branch branch-4.1.1
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 6e404490ace4cccae70faefdc3ece4fae69abc86
Author: Yida Wu <[email protected]>
AuthorDate: Fri Sep 30 10:18:17 2022 -0700

    IMPALA-11631 Fix impala crashes in impala::TopNNode::Heap::Close()
    
    The bug is introduced by IMPALA-11631, if RematerializeTuples()
    fails in ReclaimTuplePool(), it returns immediately with an error,
    however, some Heap unique_ptr in the partition_heaps_ could be
    already moved to the rematerialized_heaps, and set to nullptr,
    while the Close() of the TopNNode will try to call Close() on all
    the Heap unique_ptr in the partition_heaps_, which leads to a
    crash.
    
    There could be two issues, the calling on a nullptr as described
    above and a potential memory leaking. Because Heap doesn't call
    Close() in the destructor, the unique_ptr may not release all
    the resources properly if we don't call the Close() explicitly
    for the Heap.
    
    The patch changes the logic of moving each Heap object after
    one rematerialize process succeeds, instead, we will move all the
    Heap objects in the partition_heaps_ only when all the rematerialize
    processes succeed. Therefore, there will be no half released
    partition_heaps_, all the Heap should be called Close() during the
    TopNNode closing. Also, added checking for nullptr Heaps in the
    Close() process of TopNNode.
    
    Because it could be difficult for a testcase to inject an error
    for this case to create a crash. I did some hacking in the
    code to inject a memory allocation failure in certain cases,
    reproduced the issue, and proved the patch can solve the issue
    manually.
    
    Tests:
    Ran core tests.
    Manual test.
    
    Change-Id: Iaf45b6ef777f68e1843c076a935e4189acc6990b
    Reviewed-on: http://gerrit.cloudera.org:8080/19087
    Reviewed-by: Abhishek Rawat <[email protected]>
    Tested-by: Impala Public Jenkins <[email protected]>
    Reviewed-on: http://gerrit.cloudera.org:8080/19127
    Reviewed-by: Yida Wu <[email protected]>
    Tested-by: Quanlong Huang <[email protected]>
---
 be/src/exec/topn-node.cc | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/be/src/exec/topn-node.cc b/be/src/exec/topn-node.cc
index f65af7df3..7d2ff11b1 100644
--- a/be/src/exec/topn-node.cc
+++ b/be/src/exec/topn-node.cc
@@ -518,7 +518,8 @@ void TopNNode::Close(RuntimeState* state) {
   if (is_closed()) return;
   if (heap_ != nullptr) heap_->Close();
   for (auto& entry : partition_heaps_) {
-    entry.second->Close();
+    DCHECK(entry.second != nullptr);
+    if (entry.second != nullptr) entry.second->Close();
   }
   if (tuple_pool_.get() != nullptr) tuple_pool_->FreeAll();
   if (order_cmp_.get() != nullptr) order_cmp_->Close(state);
@@ -690,6 +691,13 @@ Status TopNNode::ReclaimTuplePool(RuntimeState* state) {
     for (auto& entry : partition_heaps_) {
       RETURN_IF_ERROR(entry.second->RematerializeTuples(this, state, 
temp_pool.get()));
       DCHECK(entry.second->DCheckConsistency());
+    }
+    // The second loop is needed for IMPALA-11631. We only move heaps from 
partition_heap_
+    // to rematerialized_heaps once all have been rematerialized. Otherwise, 
in case of
+    // an error, we may call Close() on a nullptr or leak the memory by not 
explicitly
+    // calling Close() on the heap pointer. Maybe better to add Close() in the 
Heap
+    // destructor later.
+    for (auto& entry : partition_heaps_) {
       // The key references memory in 'tuple_pool_'. Replace it with a 
rematerialized
       // tuple.
       rematerialized_heaps.push_back(move(entry.second));

Reply via email to