GitHub user hveiga closed a discussion: QUESTION: Keep columns when using 
PARTITIONED BY with SQL

Hi folks,

I am using Datafusion to partition some data stored in parquet files into a 
different set of parquet files. I would like those newly created files to 
contain the columns I am partitioning by, however currently the column gets 
removed as it becomes part of the file directory structure. Something like:

```
COPY (SELECT col1, col2, col3, col4 FROM my_external_table) TO '/output' 
PARTITIONED BY (col1) OPTIONS (format parquet);

...

/output/col1=val1/some_random_file_name.parquet
/output/col1=val2/some_random_file_name.parquet
/output/col1=val3/some_random_file_name.parquet
...

```

Is there a way in SQL to keep `col1` in the output parquet files? If not, would 
it make sense to add this as part of a new option when calling `COPY`? 

I have been looking in the documentation and open/closed issues but I could not 
find a way to do this, but if there is some information about it, a link would 
be greatly appreciated.

Thanks!

GitHub link: https://github.com/apache/datafusion/discussions/10962

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: 
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to