friendlymatthew commented on PR #15361: URL: https://github.com/apache/datafusion/pull/15361#issuecomment-2783424002
It may be beneficial to leave the existing code paths undisturbed and decide whether to retry on err. It avoids the overhead of custom `chrono::Strftime::parse` validation. Plus, the retry logic for `to_char_scalar` is simple to implement. Something like: <details> <summary>to_char_scalar</summary> <br> ```rust let formatter = ArrayFormatter::try_new(array.as_ref(), &format_options)?; let formatted: Result<Vec<Option<String>>, ArrowError> = (0..array.len()) .map(|i| { if array.is_null(i) { Ok(None) } else { formatter.value(i).try_to_string().map(Some) } }) .collect(); /* will stop iterating upon the first err and since we're dealing with one format string, it'll halt after the first format attempt */ if let Ok(formatted) = formatted { if is_scalar_expression { Ok(ColumnarValue::Scalar(ScalarValue::Utf8( formatted.first().unwrap().clone(), ))) } else { Ok(ColumnarValue::Array( Arc::new(StringArray::from(formatted)) as ArrayRef )) } } else { // if the format attempt is with a Date32, it's possible the attempt failed because // the format string contained time-specifiers, so we'll retry as Date64s if data_type == &Date32 { return to_char_scalar(expression.clone().cast_to(&Date64, None)?, format); } exec_err!("{}", formatted.unwrap_err()) } ``` </details> `to_char_array` is a bit more complex (see https://github.com/apache/datafusion/pull/15361#discussion_r2015455219), but I'll think it through. FWIW, it's worth implementing the retry logic and benchmarking the performance. I'm happy to do the work. The decision between eager casting and selective retry likely comes down to whether we're willing to slightly slow down existing behavior in exchange for better performance in the new case. But I'm just spitballing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org