[ https://issues.apache.org/jira/browse/SPARK-51118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17925872#comment-17925872 ]
Ruifeng Zheng commented on SPARK-51118: --------------------------------------- The ml side just uses the public APIs of UDF, this seems to be an issue in UDT fallback > Use Arrow python UDF for ml internal code > ----------------------------------------- > > Key: SPARK-51118 > URL: https://issues.apache.org/jira/browse/SPARK-51118 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark > Affects Versions: 4.0.0 > Reporter: Xinrong Meng > Priority: Major > > When enabling Arrow optimization for Python UDF, areas below error out with > ValueError: 'tolist' is not in list. > - CrossValidator Fit > The issue occurs in test_save_load_nested_estimator, when executing > cv.fit(dataset). > Full stacktrace see > [https://github.com/xinrong-meng/spark/actions/runs/13167027085/job/36749584932] > - OneVsRestModel _transform > Full stacktrace see > [https://github.com/xinrong-meng/spark/actions/runs/13188569330/job/36819938746] > - UnaryTransformer _transform > Now Arrow optimization is disabled in those places explicitly, but we should > enable those in the near future. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org