Yicong-Huang opened a new pull request, #54398: URL: https://github.com/apache/spark/pull/54398
## What changes were proposed in this pull request? Remove the `is_udtf` parameter from `PandasToArrowConversion.convert()` and unify the error handling logic for both UDF and UDTF conversions. **Key changes**: - Removed `is_udtf: bool` parameter from conversion methods - Unified exception handling: all conversions now use broad `ArrowException` catching - Replaced UDTF-specific `UDTF_ARROW_TYPE_CAST_ERROR` with generic `PySparkTypeError`/`PySparkValueError` - Simplified code by removing duplicate error handling branches (-39 lines) ## Why are the changes needed? Part of [SPARK-55502](https://issues.apache.org/jira/browse/SPARK-55502). The `is_udtf` flag was used to differentiate error handling between UDF and UDTF, but this created unnecessary complexity and inconsistent error messages. Unifying the logic provides: - Simpler, more maintainable code - Consistent error messages across UDF/UDTF - Better type coercion flexibility for all conversions ## Does this PR introduce any user-facing change? No. Error messages change from UDTF-specific to generic, but functionality remains the same. ## How was this patch tested? - Existing tests: 545+ Python tests pass (conversion, UDF, UDTF, grouped map, etc.) - All type conversion scenarios validated - No behavioral changes to existing functionality ## Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
