timsaucer opened a new issue, #807:
URL: https://github.com/apache/datafusion-python/issues/807

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   Add a function akin to `DataFrame.transform` from pyspark. This gives an 
easy to use way to chain DataFrame transformations.
   
   **Describe the solution you'd like**
   
   It is common to write a python function that takes as it's input a DataFrame 
plus 0 or more arguments and return a DataFrame. It is convenient to be able to 
write functions this way and to chain them. For example
   
   ```python
   def add_something_cool(df: DataFrame) -> DataFrame:
       return df.with_column("the_answer", lit(42))
   
   def add_another(df: DataFrame, col_name: str) -> DataFrame:
       return df.with_column(col_name, lit("another"))
   
   df_original.transform(add_something_cool).transform(add_another, 
"second_col").show()
   ```
   
   **Describe alternatives you've considered**
   
   To do the above operation I would probably do it like
   
   ```
   df = add_something_cool(df_original)
   df = add_another(df, "second_col")
   df.show()
   ```
   
   **Additional context**
   
   [Documentation via Databricks for 
pyspark](https://api-docs.databricks.com/python/pyspark/latest/pyspark.sql/api/pyspark.sql.DataFrame.transform.html)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to