HyukjinKwon opened a new pull request, #50658: URL: https://github.com/apache/spark/pull/50658
### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/50600 that proposes to use `RecordBatch.schema.names` instead of `column_names` for old version compatibility. `RecordBatch.column_names` is available from 13.0 (https://arrow.apache.org/docs/13.0/python/generated/pyarrow.RecordBatch.html). ### Why are the changes needed? To keep the compatibility with old PyArrow versions. It's currently broken (https://github.com/apache/spark/actions/runs/14570805420/job/40867709859): ``` File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 2178, in main process() File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 2170, in process serializer.dump_stream(out_iter, outfile) File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 1470, in dump_stream return ArrowStreamUDFSerializer.dump_stream(self, flatten_iterator(), stream) File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 181, in dump_stream return super(ArrowStreamUDFSerializer, self).dump_stream(wrap_and_init_stream(), stream) File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 120, in dump_stream for batch in iterator: File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 166, in wrap_and_init_stream for batch, _ in iterator: File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 1455, in flatten_iterator for packed in iterator: File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 1437, in load_stream for k, g in groupby(data_batches, key=lambda x: x[0]): File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 1424, in generate_data_batches DataRow = Row(*(batch.column_names)) AttributeError: 'pyarrow.lib.RecordBatch' object has no attribute 'column_names' ``` ### Does this PR introduce _any_ user-facing change? No. The main change has not been released yet. ### How was this patch tested? Unittests in this PR, and scheduled build. ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org