nicklamiller opened a new pull request, #47493:
URL: https://github.com/apache/spark/pull/47493

   ### What changes were proposed in this pull request?
   Propagate function signature of `func` in `DataFrame(...).transform(...)`.
   
   
   ### Why are the changes needed?
   Propagating the function signature for `func` in 
`DataFrame(...).transform(...)` enables type checkers like `mypy` to enforce 
that `func` is being correctly called through `DataFrame(...).transform(...)`.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Ran example script that passes `add_num` to `transform` with inappropriate 
arguments:
   - first running mypy on `master`, showing that no mypy errors are raised
   - then running mypy with the changes in this PR on the 
`type-hint-transform-args-kwargs` branch
      - this shows that the exptected mypy errors are raised
   
   <details><summary>screenshot of no mypy errors on master</summary>
   <p>
   
   <img width="904" alt="no_mypy_errors_master" 
src="https://github.com/user-attachments/assets/a2bb01b2-8ca8-41d6-a50a-fe95d7a1cfd7";>
   </p>
   </details> 
   
   <details><summary>screenshot of expected mypy errors with PR 
changes</summary>
   <p>
   
   <img width="1348" alt="mypy_errors_pr_changes" 
src="https://github.com/user-attachments/assets/8d7920cf-e9ad-42a0-b435-1f55b0469f29";>
   </p>
   </details> 
   
   
   <details><summary>example script</summary>
   <p>
   
   ```python
   
   from pyspark.sql import DataFrame, functions as F, SparkSession
   
   spark = (
       SparkSession
       .builder
       .appName("Python Spark SQL basic example")
       .getOrCreate()
   )
   
   
   df = spark.createDataFrame([("a", 0), ("b", 1)], schema=["col1", "col2"])
   
   
   def add_num(df: DataFrame, in_colname: str, *, num: int) -> DataFrame:
       return df.withColumn("new_col", F.col(in_colname) + num)
   
   
   if __name__=="__main__":
       df.transform(add_num, "col2", 2).show()                   # enforces kw
       df.transform(add_num, in_colname="col2", num="a").show()  # enforces 
type for kwarg
       df.transform(add_num, in_colname=2, num=2).show()       # enforces type 
for arg
       df.transform(add_num, "col2").show()                      # enforces 
required args
   ``` 
   
   </p>
   </details> 
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to