Correctness Issue for UDT Support in PySpark

Darcy Shen Sat, 24 Apr 2021 21:13:37 -0700

There is a correctness in the following code snippet. 
(https://issues.apache.org/jira/browse/SPARK-35211)


```

spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "false")

from pyspark.testing.sqlutils import ExamplePoint

import pandas as pd

pdf = pd.DataFrame({'point': pd.Series([ExamplePoint(1, 1), ExamplePoint(2, 
2)])})

df = spark.createDataFrame(pdf, verifySchema=False)

df.show()

```



I created two pr to resolve it:



PR 1 of 2: for inferred schema, also perform schema verification


https://github.com/apache/spark/pull/32320



PR 2 of 2: with schema verification disabled, do number conversion properly


https://github.com/apache/spark/pull/32327



Hope to get them reviewed.





BTW


And for UDT Support in PySpark, besides correctness issue, arrow support is 
also missing. (https://issues.apache.org/jira/browse/SPARK-34771) I've created 
a PR to solve it.

Correctness Issue for UDT Support in PySpark

Reply via email to