alamb commented on issue #14608:
URL: https://github.com/apache/datafusion/issues/14608#issuecomment-3016555361

   I have been thinking about this one a lot and I am sorry I haven't written 
thus up before now. I was trying to collect my thoughts. 
   
   I feel the core isssue is a tension between 
   1. `datafusion-cli` as a easy to use, fully pre-integrated tool for querying 
files (locally and remotely)
   2. Using `datafusion-cli` as a testing vehicle for DataFusion development 
itself.
   3. Using `datafusion-cli` as an example of how to integrate various 
DataFusion features (like aws s3, etc)
   
   `datafusion-cli` already has quite a few features that are outside the core 
usecase of testing DataFusion (e.g. aws s3 auth support)
   
   The more I think about it, tntegrating easy to use tpch functions into 
`datafusion-cli` feels like it is part of the first and thus maybe doesn't 
actually belong in the datafusion repository's `datafuson-cli` itself after all
   
   Some possible paths forward (not mutually exclusive)
   1. Document how to use `tpchgen-rs` to create TPCH data that can be queried 
by `datafusion-cli` (somewhere [in the cli 
docs](https://datafusion.apache.org/user-guide/cli/index.html)) but don't 
actually add more code to datafusion-cli
   1. Move the https://github.com/clflushopt/datafusion-tpch repo into the 
`datafusion-contrib` github organization so it is more discoverable
   2. Actually implement the datafusion tpch functions from 
https://github.com/clflushopt/datafusion-tpch into the core datafusion 
repository (along with a dependency on tpchgen-rs). 
   3. Create a new repository for a `datafusion-cli++` (probably need a better 
name) with the explicit goal of being a fully pre-integrated CLI experience 
like `duckddb` 
   
   I have been dreaming about the `datafusion-cli++` idea for a while now too. 
I think it would be really cool technically to build a tool that was able to 
query remote sources really quickly and easily (aka kind of a `daft` like 
experience) -- think caching parquet metadata, catalog, iceberg, etc. But I get 
ahead of myself and I haven't convinced myself this can be done in a reaonable 
amount of time
   
   @matthewmturner  is working on something similar in 
https://github.com/datafusion-contrib/datafusion-dft but that also includes 
other things such as as tui (in fact he seems to be using tpchgen as well: 
https://github.com/datafusion-contrib/datafusion-dft/pull/331 :) ) 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to