[ 
https://issues.apache.org/jira/browse/SPARK-50815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-50815.
---------------------------------
    Fix Version/s: 4.1.0
       Resolution: Fixed

Issue resolved by pull request 49487
[https://github.com/apache/spark/pull/49487]

> Fix bug where passing null Variants in createDataFrame causes it to fail
> ------------------------------------------------------------------------
>
>                 Key: SPARK-50815
>                 URL: https://issues.apache.org/jira/browse/SPARK-50815
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 4.0.0
>            Reporter: Harsh Motwani
>            Assignee: Harsh Motwani
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.1.0
>
>
> Passing "None" as one of the Variant values causes createDataFrame to fail. 
> This could also cause issues in other code-paths such as UDFs.
> ```
> >>> spark.createDataFrame([(VariantVal(bytearray([12, 1]), bytearray([1, 0, 
> >>> 0])),), (None,)], "v variant").show()
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/Users/harsh.motwani/spark/python/pyspark/sql/session.py", line 1567, 
> in createDataFrame
>     return super(SparkSession, self).createDataFrame(  # type: 
> ignore[call-overload]
>   File "/Users/harsh.motwani/spark/python/pyspark/sql/session.py", line 1611, 
> in _create_dataframe
>     if not is_remote_only() and isinstance(data, RDD):
>   File "/Users/harsh.motwani/spark/python/pyspark/sql/session.py", line 1191, 
> in _createFromLocal
>     print("TUPLE DATA: ", tupled_data)
>   File "/Users/harsh.motwani/spark/python/pyspark/sql/session.py", line 1191, 
> in <listcomp>
>     print("TUPLE DATA: ", tupled_data)
>   File "/Users/harsh.motwani/spark/python/pyspark/sql/types.py", line 1493, 
> in toInternal
>     return tuple(
>   File "/Users/harsh.motwani/spark/python/pyspark/sql/types.py", line 1494, 
> in <genexpr>
>     f.toInternal(v) if c else v
>   File "/Users/harsh.motwani/spark/python/pyspark/sql/types.py", line 1095, 
> in toInternal
>     return self.dataType.toInternal(obj)
>   File "/Users/harsh.motwani/spark/python/pyspark/sql/types.py", line 1587, 
> in toInternal
>     assert isinstance(variant, VariantVal)
> AssertionError
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to