[PR] Feat/parameterized sql queries [datafusion-python]

via GitHub Thu, 05 Dec 2024 18:01:16 -0800


timsaucer opened a new pull request, #964:
URL: https://github.com/apache/datafusion-python/pull/964


   # Which issue does this PR close?
   
   Closes #513
   
    # Rationale for this change
   
   Users would like to use DataFrames as a parameter inside an SQL query. With 
this change, you can do the following:
   
   ```
   from datafusion import SessionContext
   ctx = SessionContext()
   df_customer = ctx.read_parquet("examples/tpch/data/customer.parquet")
   ctx.sql("select c_custkey, c_name from {df}", df=df_customer)
   ```
   
   The string `{df}` in the query will be replaced with the SQL equivalent of 
the logical plan of the DataFrame.
   
   # What changes are included in this PR?
   
   All of the `read_parquet`, `read_avro`, `read_json`, and `read_csv` have 
been changed to call `register_` with a generated table name. This table name 
is the file name. If a table already exists with that file name, a generated 
UUID is used instead.
   
   One unit test is included.
   
   # Are there any user-facing changes?
   
   There is an addition of an optional table name to each of the `read_` 
functions above, but it is a non breaking change for the users.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] Feat/parameterized sql queries [datafusion-python]

Reply via email to