martin-g commented on code in PR #19570:
URL: https://github.com/apache/datafusion/pull/19570#discussion_r2654838024
##########
datafusion/functions/src/string/split_part.rs:
##########
@@ -219,22 +219,22 @@ where
.try_for_each(|((string, delimiter), n)| -> Result<(),
DataFusionError> {
match (string, delimiter, n) {
(Some(string), Some(delimiter), Some(n)) => {
- let split_string: Vec<&str> =
string.split(delimiter).collect();
- let len = split_string.len();
-
- let index = match n.cmp(&0) {
- std::cmp::Ordering::Less => len as i64 + n,
+ let result = match n.cmp(&0) {
+ std::cmp::Ordering::Greater => {
+ // Positive index: use nth() to avoid collecting
all parts
+ // This stops iteration as soon as we find the nth
element
+ string.split(delimiter).nth((n - 1) as usize)
Review Comment:
Are 32-bit systems supported ?
`n` is Int64, so it is possible that this cast may lead to a truncation or
even a crash in debug build
##########
datafusion/functions/src/string/split_part.rs:
##########
@@ -219,22 +219,22 @@ where
.try_for_each(|((string, delimiter), n)| -> Result<(),
DataFusionError> {
match (string, delimiter, n) {
(Some(string), Some(delimiter), Some(n)) => {
- let split_string: Vec<&str> =
string.split(delimiter).collect();
- let len = split_string.len();
-
- let index = match n.cmp(&0) {
- std::cmp::Ordering::Less => len as i64 + n,
+ let result = match n.cmp(&0) {
+ std::cmp::Ordering::Greater => {
+ // Positive index: use nth() to avoid collecting
all parts
+ // This stops iteration as soon as we find the nth
element
+ string.split(delimiter).nth((n - 1) as usize)
+ }
+ std::cmp::Ordering::Less => {
+ // Negative index: use rsplit().nth() to
efficiently get from the end
+ // rsplit iterates in reverse, so -1 means first
from rsplit (index 0)
+ string.rsplit(delimiter).nth((-n - 1) as usize)
Review Comment:
another corner case: `-n` will fail for `i64::MIN`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]