nicklamiller opened a new pull request, #47493: URL: https://github.com/apache/spark/pull/47493
### What changes were proposed in this pull request? Propagate function signature of `func` in `DataFrame(...).transform(...)`. ### Why are the changes needed? Propagating the function signature for `func` in `DataFrame(...).transform(...)` enables type checkers like `mypy` to enforce that `func` is being correctly called through `DataFrame(...).transform(...)`. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Ran example script that passes `add_num` to `transform` with inappropriate arguments: - first running mypy on `master`, showing that no mypy errors are raised - then running mypy with the changes in this PR on the `type-hint-transform-args-kwargs` branch - this shows that the exptected mypy errors are raised <details><summary>screenshot of no mypy errors on master</summary> <p> <img width="904" alt="no_mypy_errors_master" src="https://github.com/user-attachments/assets/a2bb01b2-8ca8-41d6-a50a-fe95d7a1cfd7"> </p> </details> <details><summary>screenshot of expected mypy errors with PR changes</summary> <p> <img width="1348" alt="mypy_errors_pr_changes" src="https://github.com/user-attachments/assets/8d7920cf-e9ad-42a0-b435-1f55b0469f29"> </p> </details> <details><summary>example script</summary> <p> ```python from pyspark.sql import DataFrame, functions as F, SparkSession spark = ( SparkSession .builder .appName("Python Spark SQL basic example") .getOrCreate() ) df = spark.createDataFrame([("a", 0), ("b", 1)], schema=["col1", "col2"]) def add_num(df: DataFrame, in_colname: str, *, num: int) -> DataFrame: return df.withColumn("new_col", F.col(in_colname) + num) if __name__=="__main__": df.transform(add_num, "col2", 2).show() # enforces kw df.transform(add_num, in_colname="col2", num="a").show() # enforces type for kwarg df.transform(add_num, in_colname=2, num=2).show() # enforces type for arg df.transform(add_num, "col2").show() # enforces required args ``` </p> </details> ### Was this patch authored or co-authored using generative AI tooling? No -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org