Hi,

In <CAH5mb98Dq7ssrQq9n5yW3G1YznH=q7vvoz20uhg7vxg33zb...@mail.gmail.com>
  "Re: Columnar format export in Postgres" on Thu, 13 Jun 2024 22:30:24 +0530,
  Sushrut Shivaswamy <sushrut.shivasw...@gmail.com> wrote:

>  - To facilitate efficient querying it would help to export multiple
> parquet files for the table instead of a single file.
>    Having multiple files allows queries to skip chunks if the key range in
> the chunk does not match query filter criteria.
>    Even within a chunk it would help to be able to configure the size of a
> row group.
>       - I'm not sure how these parameters will be exposed within `COPY TO`.
>         Or maybe the extension implementing the `COPY TO` handler will
> allow this configuration?

Yes. But adding support for custom COPY TO options is
out-of-scope in the first version. We will focus on only the
minimal features in the first version. We can improve it
later based on use-cases.

See also: 
https://www.postgresql.org/message-id/20240131.141122.279551156957581322.kou%40clear-code.com

>  - Regarding using file_fdw to read Apache Arrow and Apache Parquet file
> because file_fdw is based on COPY FROM:
>      - I'm not too clear on this. file_fdw seems to allow creating a table
> from  data on disk exported using COPY TO.

Correct.

>        But is the newly created table still using the data on disk(maybe in
> columnar format or csv) or is it just reading that data to create a row
> based table.

The former.

>        I'm not aware of any capability in the postgres planner to read
> columnar files currently without using an extension like parquet_fdw.

Correct. We still need another approach such as parquet_fdw
with the COPY format extensible feature to optimize query
against Apache Parquet data. file_fdw can just read Apache
Parquet data by SELECT. Sorry for confusing you.


Thanks,
-- 
kou


Reply via email to