Csaba Ringhofer created IMPALA-13949: ----------------------------------------
Summary: Release plain encoded string buffers earlier if all rows are dropped Key: IMPALA-13949 URL: https://issues.apache.org/jira/browse/IMPALA-13949 Project: IMPALA Issue Type: Improvement Components: Backend Reporter: Csaba Ringhofer Currently Impala always attaches data page buffers of plain encoded strings to the next row batch, even if all rows were discarded by the predicates in the given page. This can lead to surprising memory consumption in very selective query. {code} set md_dop=0; set num_nodes=1; set batch_size=1; select l_comment from tpch_parquet.lineitem where l_comment like "%nomatch"; {code} >From the profile: {code} - RowBatchBytesEnqueued: 174.75 MB (183239804) - RowBatchQueuePeakMemoryUsage: 8.06 MB (8453100) - RowBatchesEnqueued: 29 (29) - RowsRead: 6.00M (6001215) - RowsReturned: 0 (0) {code} What happens above is that the RowBatch hits AtCapacity() due to hitting the 8MB mem limit for attached buffers and 29 row batches with 0 rows are returned. This has a performance impact because freeing these buffers happens on a different thread (in the mt_dop=0 case) than the allocation. https://github.com/apache/impala/blob/f222574f04fc7b94e1ad514d7d720d50d036a226/be/src/exec/parquet/hdfs-parquet-scanner.cc#L2460 A solution could be to copy the strings if the predicate is very selective. What complicates this is that this copy is only useful in case of plain encoding, not dictionary encoding, and theoretically a single scratch batch can have rows both from dictionary and plain encoded pages. Also, as a page may fill multiple row batches, it is possible that a selective batch is followed by a non-selective one where attaching the buffer still makes sense. Another thing that could affect this is small string optimization - currently it is not applied in Parquet scanner, while if all rows in a data page are smallified the original buffer could be still dropped. -- This message was sent by Atlassian Jira (v8.20.10#820010)