Hi Rossi,
When profiling Acero we noticed that there was a lot of overhead regarding 
memory allocation, specifically in the creation/destruction of std::vector. 
This thread local data in QueryContext was put there as a preparation to 
refactor other nodes to use TempVectorStack when they need a temporary block of 
memory. 

Hope this helps,
Sasha 

> 9 марта 2023 г., в 09:11, Ruoxi Sun <zanmato1...@gmail.com> написал(а):
> 
> Hi folks,
> 
> I see that the member `tld_
> <https://github.com/apache/arrow/blob/0ac0f733ff61f2db45cbff54def8768b3ceb8a9d/cpp/src/arrow/compute/exec/query_context.h#L150>`
> in class `QueryContext` is used by `BloomFilterPushdownContext
> <https://github.com/apache/arrow/blob/main/cpp/src/arrow/compute/exec/hash_join_node.cc>`
> and `SwissJoin
> <https://github.com/apache/arrow/blob/main/cpp/src/arrow/compute/exec/swiss_join.cc>`,
> both of which are parts of hash join node.
> 
> I'm wondering if there is any particular reason to design it this way. It
> seems reasonable to move it into hash join node - in which there exists
> per-thread states - and subsequently pass it down to the
> `BloomFilterPushdownContext` and `SwissJoin`. This way the `QueryContext`
> could stay thread-local-state-agnostic.
> 
> Please help. Thanks in advance.
> 
> *Rossi Sun*

Reply via email to