Re: Question about thread local data in `QueryContext`

Ruoxi Sun Thu, 09 Mar 2023 23:46:52 -0800

Got it, that makes sense.

Thanks for the answer, really appreciate it!


*Rossi Sun*


Sasha Krassovsky <krassovskysa...@gmail.com> 于2023年3月10日周五 14:38写道：

> Hi Rossi,
> It is supposed to be used by every node that needs a temporary array. It
> is not used because we haven’t performed the refactor.
>
> Sasha
>
> > 9 марта 2023 г., в 21:57, Ruoxi Sun <zanmato1...@gmail.com> написал(а):
> >
> > Hi Sasha, thanks for the kind reply. Yeah, that makes sense for using
> > thread local data to reduce the vector allocation/deallocation overhead.
> > However I'm still wondering if this thread local data has to be in
> > QueryContext? Specifically, there is thread local state
> > <
> https://github.com/apache/arrow/blob/ad44e8e4e669019299dc56b37d24d2976588b648/cpp/src/arrow/compute/exec/swiss_join.cc#L2505
> >
> > within SwissJoin already, does it make sense to put the thread local
> vector
> > inside SwissJoin rather than QueryContext? Or, is the thread local data
> in
> > QueryContext is designed to be used inter-node?
> >
> > Thanks.
> >
> > *Rossi Sun*
> >
> >
> > Sasha Krassovsky <krassovskysa...@gmail.com> 于2023年3月10日周五 01:54写道：
> >
> >> Hi Rossi,
> >> When profiling Acero we noticed that there was a lot of overhead
> regarding
> >> memory allocation, specifically in the creation/destruction of
> std::vector.
> >> This thread local data in QueryContext was put there as a preparation to
> >> refactor other nodes to use TempVectorStack when they need a temporary
> >> block of memory.
> >>
> >> Hope this helps,
> >> Sasha
> >>
> >>>> 9 марта 2023 г., в 09:11, Ruoxi Sun <zanmato1...@gmail.com>
> написал(а):
> >>>
> >>> Hi folks,
> >>>
> >>> I see that the member `tld_
> >>> <
> >>
> https://github.com/apache/arrow/blob/0ac0f733ff61f2db45cbff54def8768b3ceb8a9d/cpp/src/arrow/compute/exec/query_context.h#L150
> >>> `
> >>> in class `QueryContext` is used by `BloomFilterPushdownContext
> >>> <
> >>
> https://github.com/apache/arrow/blob/main/cpp/src/arrow/compute/exec/hash_join_node.cc
> >>> `
> >>> and `SwissJoin
> >>> <
> >>
> https://github.com/apache/arrow/blob/main/cpp/src/arrow/compute/exec/swiss_join.cc
> >>> `,
> >>> both of which are parts of hash join node.
> >>>
> >>> I'm wondering if there is any particular reason to design it this way.
> It
> >>> seems reasonable to move it into hash join node - in which there exists
> >>> per-thread states - and subsequently pass it down to the
> >>> `BloomFilterPushdownContext` and `SwissJoin`. This way the
> `QueryContext`
> >>> could stay thread-local-state-agnostic.
> >>>
> >>> Please help. Thanks in advance.
> >>>
> >>> *Rossi Sun*
> >>
>

Re: Question about thread local data in `QueryContext`

Reply via email to