[ https://issues.apache.org/jira/browse/SPARK-53358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Allison Wang reassigned SPARK-53358: ------------------------------------ Assignee: Allison Wang > Improve error message for arrow_udtf output type mismatch > --------------------------------------------------------- > > Key: SPARK-53358 > URL: https://issues.apache.org/jira/browse/SPARK-53358 > Project: Spark > Issue Type: Sub-task > Components: PySpark > Affects Versions: 4.1.0 > Reporter: Allison Wang > Assignee: Allison Wang > Priority: Major > Labels: pull-request-available > > Currently: > In [7]: import pyarrow as pa > In [8]: @arrow_udtf(returnType="x int") > ...: class MyArrowUDTF: > ...: def eval(self, batch: pa.RecordBatch): > ...: yield batch.column(0) > MyArrowUDTF(df.asTable()).show() > will fail with > pyspark.errors.exceptions.base.PySparkRuntimeError: > [UDTF_ARROW_TYPE_CONVERSION_ERROR] Cannot convert the output value of the > input '[ > 0 > ]' with type 'struct<x:int>' to the specified return type of the column: > 'struct<x: int32>'. Please check if the data types match and try again. > > This is very confusing. We should throw a more user-friendly error message to > ask users to return correct output (iterator of pa.Table or pa.RecordBatch). -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org