Rachelint commented on PR #11319: URL: https://github.com/apache/datafusion/pull/11319#issuecomment-2212686882
> > I suspect the remaining cases where we are using collect could be made more efficient using the Builder pattern? > > I think the reason the Builder is faster for Strings / Binary is that due to how the references worked out, we can avoid copying the strings via `value.to_string()` > > I don't think the Builder pattern is fundamentally better/worse than the `from_iter` (under the covers they all end up doing the same thing in arrow-rs I think) Yes, I tested the `Uint64Array` case in my POC, use `Builder` directly is a bit slower than use `from_iter` actually. https://github.com/Rachelint/arrow-datafusion/blob/70b9f05e737e81c514259e70a8bd2e9f0ad8e725/datafusion/core/src/datasource/physical_plan/parquet/statistics.rs#L790 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
