timsaucer opened a new pull request, #964: URL: https://github.com/apache/datafusion-python/pull/964
# Which issue does this PR close? Closes #513 # Rationale for this change Users would like to use DataFrames as a parameter inside an SQL query. With this change, you can do the following: ``` from datafusion import SessionContext ctx = SessionContext() df_customer = ctx.read_parquet("examples/tpch/data/customer.parquet") ctx.sql("select c_custkey, c_name from {df}", df=df_customer) ``` The string `{df}` in the query will be replaced with the SQL equivalent of the logical plan of the DataFrame. # What changes are included in this PR? All of the `read_parquet`, `read_avro`, `read_json`, and `read_csv` have been changed to call `register_` with a generated table name. This table name is the file name. If a table already exists with that file name, a generated UUID is used instead. One unit test is included. # Are there any user-facing changes? There is an addition of an optional table name to each of the `read_` functions above, but it is a non breaking change for the users. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org