2010YOUY01 commented on PR #11943:
URL: https://github.com/apache/datafusion/pull/11943#issuecomment-2287796117
> > I think I'm not so familiar with the Emit::First and there is no block
implementation done yet. Could we emit every block size of values we have?
Something like Emit::First(block size).
> > We have `emit_early_if_necessary` that do First(n) emission when
condition met.
> > ```rust
> > fn emit_early_if_necessary(&mut self) -> Result<()> {
> > if self.group_values.len() >= self.batch_size
> > && matches!(self.group_ordering, GroupOrdering::None)
> > && matches!(self.mode, AggregateMode::Partial)
> > && self.update_memory_reservation().is_err()
> > {
> > let n = self.group_values.len() / self.batch_size *
self.batch_size;
> > let batch = self.emit(EmitTo::First(n), false)?;
> > self.exec_state = ExecutionState::ProducingOutput(batch);
> > }
> > Ok(())
> > }
> > ```
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > If we emit every block size we accumulated, is it something similar to
the block approach? If not, what is the difference?
> > Upd: One difference I can think of is that in block approach, we have
all the accumulated values, and we can optimize it based on **all the values**
we have, while in Emit::First mode, we early emit partial values, therefore, we
loss the change if we want to do optimization based on **all the values** 🤔 ?
>
> Ok, I think I got it now, if we constantly `emit first n` when the group
len just equal to `batch size`, it is actually equal to blocked approach.
>
> But `emit first n` will be just triggered in some special cases in my
knowledge:
>
> * Streaming aggr (the really constantly `emit first n` case, and I forced
to disable blocked mode in this case)
> * In `Partial` operator if found the memory exceeded limit
(`emit_early_if_necessary`)
>
> And in others, we need to poll to end, then `emit all` and use `slice`
method to split it and return now:
>
> * For example, in the `FinalPartitioned` operator
> * In the `Partial` operator when memory under the memory limit
> * other cases...
>
> And in such cases, blocked approach may be effecient for both memory and
cpu as stated above?
I still don't understand this `FirstBlocks` variant, if partial aggregate
runs out of memory, output first N blocks and let final aggregation process
them first looks like won't help much regarding total memory usage (since it's
high cardinality aggregation)
Is it related to the spilling implementation? I will check it later.
Also thanks for pushing this forward, I think this approach is promising for
performance
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]