kosiew commented on code in PR #20922:
URL: https://github.com/apache/datafusion/pull/20922#discussion_r2964937756
##########
datafusion/physical-plan/src/sorts/builder.rs:
##########
@@ -200,3 +229,143 @@ pub(crate) fn try_grow_reservation_to_at_least(
}
Ok(())
}
+
+/// Returns true if the error is an Arrow offset overflow.
+fn is_offset_overflow(e: &DataFusionError) -> bool {
+ matches!(
+ e,
+ DataFusionError::ArrowError(boxed, _)
+ if matches!(boxed.as_ref(), ArrowError::OffsetOverflowError(_))
+ )
+}
+
+fn offset_overflow_error() -> DataFusionError {
+ DataFusionError::ArrowError(Box::new(ArrowError::OffsetOverflowError(0)),
None)
+}
+
+fn recover_offset_overflow_from_panic<T, F>(f: F) -> Result<T>
Review Comment:
The retry behavior looks good, but right now it seems like it’s only covered
through synthetic helper failures.
Since the production path depends on matching Arrow’s panic payload pretty
closely, I think it’d be great to add one higher-level regression test closer
to `BatchBuilder::build_record_batch()` or `SortPreservingMergeStream` that
exercises the retry/drain flow end-to-end through an injectable interleave hook.
That would make it a lot easier to catch future Arrow-side panic-message
changes - or refactors in this file before they slip through.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]