Jungtaek Lim created SPARK-50752:
------------------------------------
Summary: Introduce configs for Python UDF execution
Key: SPARK-50752
URL: https://issues.apache.org/jira/browse/SPARK-50752
Project: Spark
Issue Type: Improvement
Components: PySpark, SQL
Affects Versions: 4.0.0
Reporter: Jungtaek Lim
Unlike Pandas UDF, Python UDF does not have configurations to tune for
performance. It doesn't mean we do not batch the input/output with Python UDF,
it means the batch size is hard-coded.
There are configurations which are available in Pandas UDF and mostly also
relevant to Python UDF:
* batch size (executor <-> python worker)
* buffer size to write to channel
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]