timsaucer opened a new pull request, #964:
URL: https://github.com/apache/datafusion-python/pull/964
# Which issue does this PR close?
Closes #513
# Rationale for this change
Users would like to use DataFrames as a parameter inside an SQL query. With
this change, you can do the following:
```
from datafusion import SessionContext
ctx = SessionContext()
df_customer = ctx.read_parquet("examples/tpch/data/customer.parquet")
ctx.sql("select c_custkey, c_name from {df}", df=df_customer)
```
The string `{df}` in the query will be replaced with the SQL equivalent of
the logical plan of the DataFrame.
# What changes are included in this PR?
All of the `read_parquet`, `read_avro`, `read_json`, and `read_csv` have
been changed to call `register_` with a generated table name. This table name
is the file name. If a table already exists with that file name, a generated
UUID is used instead.
One unit test is included.
# Are there any user-facing changes?
There is an addition of an optional table name to each of the `read_`
functions above, but it is a non breaking change for the users.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]