Hi, In <CAH5mb98Dq7ssrQq9n5yW3G1YznH=q7vvoz20uhg7vxg33zb...@mail.gmail.com> "Re: Columnar format export in Postgres" on Thu, 13 Jun 2024 22:30:24 +0530, Sushrut Shivaswamy <sushrut.shivasw...@gmail.com> wrote:
> - To facilitate efficient querying it would help to export multiple > parquet files for the table instead of a single file. > Having multiple files allows queries to skip chunks if the key range in > the chunk does not match query filter criteria. > Even within a chunk it would help to be able to configure the size of a > row group. > - I'm not sure how these parameters will be exposed within `COPY TO`. > Or maybe the extension implementing the `COPY TO` handler will > allow this configuration? Yes. But adding support for custom COPY TO options is out-of-scope in the first version. We will focus on only the minimal features in the first version. We can improve it later based on use-cases. See also: https://www.postgresql.org/message-id/20240131.141122.279551156957581322.kou%40clear-code.com > - Regarding using file_fdw to read Apache Arrow and Apache Parquet file > because file_fdw is based on COPY FROM: > - I'm not too clear on this. file_fdw seems to allow creating a table > from data on disk exported using COPY TO. Correct. > But is the newly created table still using the data on disk(maybe in > columnar format or csv) or is it just reading that data to create a row > based table. The former. > I'm not aware of any capability in the postgres planner to read > columnar files currently without using an extension like parquet_fdw. Correct. We still need another approach such as parquet_fdw with the COPY format extensible feature to optimize query against Apache Parquet data. file_fdw can just read Apache Parquet data by SELECT. Sorry for confusing you. Thanks, -- kou