Jungtaek Lim created SPARK-50752:
------------------------------------

             Summary: Introduce configs for Python UDF execution
                 Key: SPARK-50752
                 URL: https://issues.apache.org/jira/browse/SPARK-50752
             Project: Spark
          Issue Type: Improvement
          Components: PySpark, SQL
    Affects Versions: 4.0.0
            Reporter: Jungtaek Lim


Unlike Pandas UDF, Python UDF does not have configurations to tune for 
performance. It doesn't mean we do not batch the input/output with Python UDF, 
it means the batch size is hard-coded.

There are configurations which are available in Pandas UDF and mostly also 
relevant to Python UDF:
 * batch size (executor <-> python worker)
 * buffer size to write to channel



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to