Csaba Ringhofer created IMPALA-13949:
----------------------------------------

             Summary: Release plain encoded string buffers earlier if all rows 
are dropped
                 Key: IMPALA-13949
                 URL: https://issues.apache.org/jira/browse/IMPALA-13949
             Project: IMPALA
          Issue Type: Improvement
          Components: Backend
            Reporter: Csaba Ringhofer


Currently Impala always attaches data page buffers of plain encoded strings to 
the next row batch, even if all rows were discarded by the predicates in the 
given page. This can lead to surprising memory consumption in very selective 
query.

{code}
set md_dop=0; set num_nodes=1; set batch_size=1;
select l_comment from tpch_parquet.lineitem where l_comment like "%nomatch";
{code}
>From the profile:
{code}
         - RowBatchBytesEnqueued: 174.75 MB (183239804)
         - RowBatchQueuePeakMemoryUsage: 8.06 MB (8453100)
         - RowBatchesEnqueued: 29 (29)
         - RowsRead: 6.00M (6001215)
         - RowsReturned: 0 (0)
{code}
What happens above is that the RowBatch hits AtCapacity() due to hitting the 
8MB  mem limit for attached buffers and 29 row batches with 0 rows are 
returned. This has a performance impact because freeing these buffers happens 
on a different thread (in the mt_dop=0 case) than the allocation.
https://github.com/apache/impala/blob/f222574f04fc7b94e1ad514d7d720d50d036a226/be/src/exec/parquet/hdfs-parquet-scanner.cc#L2460

A solution could be to copy the strings if the predicate is very selective. 
What complicates this is that this copy is only useful in case of plain 
encoding, not dictionary encoding, and theoretically a single scratch batch can 
have rows both from dictionary and plain encoded pages. Also, as a page may 
fill multiple row batches, it is possible that a selective batch is followed by 
a non-selective one where attaching the buffer still makes sense.

Another thing that could affect this is small string optimization - currently 
it is not applied in Parquet scanner, while if all rows in a data page are 
smallified the original buffer could be still dropped.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to