tlm365 commented on code in PR #14025: URL: https://github.com/apache/datafusion/pull/14025#discussion_r1905010563
########## datafusion/functions/src/unicode/reverse.rs: ########## @@ -116,14 +115,23 @@ pub fn reverse<T: OffsetSizeTrait>(args: &[ArrayRef]) -> Result<ArrayRef> { } } -fn reverse_impl<'a, T: OffsetSizeTrait, V: ArrayAccessor<Item = &'a str>>( +fn reverse_impl<'a, T: OffsetSizeTrait, V: StringArrayType<'a>>( string_array: V, ) -> Result<ArrayRef> { - let result = ArrayIter::new(string_array) - .map(|string| string.map(|string: &str| string.chars().rev().collect::<String>())) - .collect::<GenericStringArray<T>>(); + let mut builder: GenericStringBuilder<T> = + GenericStringBuilder::with_capacity(string_array.len(), 1024); Review Comment: @2010YOUY01 Thanks for reviewing, > I think we can use the actual data size here for pre-allocation, instead of a constant 1024, the complexity of adding another argument for array size seems reasonable I agree that it would be better if we could pre-allocate the actual data size here, but I think it's difficult to compute accurately - it depends on context. Keeping it simple here seems reasonable as well. Currently `GenericStringBuilder` have `new` and `with_capacity` to init new builder, and 1024 is default size if we using `GenericStringBuilder::new` ([ref](https://github.com/apache/arrow-rs/blob/4f1f6e57c568fae8233ab9da7d7c7acdaea4112a/arrow-array/src/builder/generic_bytes_builder.rs#L39-L41)) that's why I choose 1024 here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org