BlakeOrth commented on issue #19055:
URL: https://github.com/apache/datafusion/issues/19055#issuecomment-3676013903

   > I wonder if I should first add test files in `parquet-testing` repo, or if 
you have other suggestions. Thank you!
   
   @jizezhang I don't think adding files to `parquet-testing` is necessary 
here. The caching functionality works equally as well with temporary in-memory 
object stores, so there's no technical reasons to prefer files on disk or some 
other remote storage for testing. There are quite a few ways to accomplish 
getting some valid test files.
   
   I would probably recommend you take inspiration from the 
`object_store_access.rs` tests which set up various in-memory tables 
programmatically:
   
https://github.com/apache/datafusion/blob/2e3707e380172a4ba1ae5efabe7bd27a354bfb2d/datafusion/core/tests/datasource/object_store_access.rs#L505
   
   If you're looking to do some integration level testing similar to the tests 
here:
   
https://github.com/apache/datafusion/blob/2e3707e380172a4ba1ae5efabe7bd27a354bfb2d/datafusion-cli/tests/cli_integration.rs#L402
   
   You can use DataFusion to create parquet files by using the `COPY` command. 
You can see various usage of `COPY` with parquet files in the `sqllogic` tests:
   
https://github.com/apache/datafusion/blob/2e3707e380172a4ba1ae5efabe7bd27a354bfb2d/datafusion/sqllogictest/test_files/parquet.slt#L18
   
   For testing I think `COPY` is preferable to `INSERT` because it will allow 
you to have well defined names for your files. The subtle caveat here is that 
when the cache is enabled tables that are already defined won't pick up changes 
from the `COPY` commands, so you'd have to make sure you write all the files 
first prior to creating a table in DataFusion.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to