[PR] [SPARK-51739][PYTHON][FOLLOW-UP] Set spark.sql.execution.arrow.pyspark.validateSchema.enabled for 3.5 connect client build [spark]

via GitHub Wed, 16 Apr 2025 18:25:30 -0700


HyukjinKwon opened a new pull request, #50613:
URL: https://github.com/apache/spark/pull/50613


   ### What changes were proposed in this pull request?
   
   We have a scheduled build to test Spark 3.5 client with Spark 4.0 server but 
this fails. We should set the legacy conf to make the tests passing.
   
   ### Why are the changes needed?
   
   Build fails as below 
(https://github.com/apache/spark/actions/runs/14502535136/job/40685331457):
   
   ```
   ======================================================================
   ERROR [0.289s]: test_empty_rows 
(pyspark.sql.tests.connect.test_parity_arrow_map.ArrowMapParityTests.test_empty_rows)
   ----------------------------------------------------------------------
   Traceback (most recent call last):
     File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/test_arrow_map.py", 
line 128, in test_empty_rows
       self.assertEqual(self.spark.range(10).mapInArrow(empty_rows, "a 
int").count(), 0)
                        
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/dataframe.py", 
line 248, in count
       pdd = self.agg(_invoke_function("count", lit(1))).toPandas()
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/dataframe.py", 
line 1663, in toPandas
       return self._session.client.to_pandas(query)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", 
line 873, in to_pandas
       table, schema, metrics, observed_metrics, _ = self._execute_and_fetch(
                                                     ^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", 
line 1283, in _execute_and_fetch
       for response in self._execute_and_fetch_as_iterator(req):
     File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", 
line 1264, in _execute_and_fetch_as_iterator
       self._handle_error(error)
     File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", 
line 1503, in _handle_error
       self._handle_rpc_error(error)
     File 
"/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", 
line 
[153](https://github.com/apache/spark/actions/runs/14502535136/job/40685331457#step:10:154)9,
 in _handle_rpc_error
       raise convert_exception(info, status.message) from None
   pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
(org.apache.spark.SparkException) [ARROW_TYPE_MISMATCH] Invalid schema from 
SQL_MAP_ARROW_ITER_UDF: expected StructType(StructField(a,IntegerType,true)), 
got StructType(StructField(a,DoubleType,true)). SQLSTATE: 42K0G
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   
   No, test-only.
   
   ### How was this patch tested?
   
   Will monitor the build.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[PR] [SPARK-51739][PYTHON][FOLLOW-UP] Set spark.sql.execution.arrow.pyspark.validateSchema.enabled for 3.5 connect client build [spark]

Reply via email to