Re: speed up full table scan using psql

2023-06-01 Thread Adrian Klaver
On 5/31/23 22:51, Lian Jiang wrote: The whole command is: psql %(pg_uri)s -c %(sql)s | %(sed)s | %(pv)s | %(split)s) 2>&1 | %(tr)s where: sql is "copy (select row_to_json(x_tmp_uniq) from public.mytable x_tmp_uniq) to stdout" sed, pv, split, tr together format and split the stdout into jsonl

Re: speed up full table scan using psql

2023-05-31 Thread Lian Jiang
The whole command is: psql %(pg_uri)s -c %(sql)s | %(sed)s | %(pv)s | %(split)s) 2>&1 | %(tr)s where: sql is "copy (select row_to_json(x_tmp_uniq) from public.mytable x_tmp_uniq) to stdout" sed, pv, split, tr together format and split the stdout into jsonl files. Hope this helps. On Wed, May 3

Re: speed up full table scan using psql

2023-05-31 Thread Adrian Klaver
On 5/31/23 13:57, Lian Jiang wrote: The command is: psql $db_url -c "copy (select row_to_json(x_tmp_uniq) from public.mytable x_tmp_uniq) to stdout" What is taking the stdout and what it is it doing? postgres version:  14.7 Does this mean COPY and java CopyManager may not help since my psql

Re: speed up full table scan using psql

2023-05-31 Thread Thorsten Glaser
On Wed, 31 May 2023, Adrian Klaver wrote: > Given that I would just COPY the data out as CSV. I recently did something similar. I found the JSON functions not quite satisfying and the extra spaces redundant, but it turns out that, for a numerical table, exporting as CSV, loading that via AJAX the

Re: speed up full table scan using psql

2023-05-31 Thread Adrian Klaver
On 5/31/23 13:57, Lian Jiang wrote: The command is: psql $db_url -c "copy (select row_to_json(x_tmp_uniq) from public.mytable x_tmp_uniq) to stdout" postgres version:  14.7 Does this mean COPY and java CopyManager may not help since my psql command already uses copy? Regarding pg_dump, it doe

Re: speed up full table scan using psql

2023-05-31 Thread Adrian Klaver
On 5/31/23 13:57, Lian Jiang wrote: The command is: psql $db_url -c "copy (select row_to_json(x_tmp_uniq) from public.mytable x_tmp_uniq) to stdout" postgres version:  14.7 Does this mean COPY and java CopyManager may not help since my psql command already uses copy? I don't think the issue i

Re: speed up full table scan using psql

2023-05-31 Thread Lian Jiang
The command is: psql $db_url -c "copy (select row_to_json(x_tmp_uniq) from public.mytable x_tmp_uniq) to stdout" postgres version: 14.7 Does this mean COPY and java CopyManager may not help since my psql command already uses copy? Regarding pg_dump, it does not support json format which means ext

Re: speed up full table scan using psql

2023-05-31 Thread Adrian Klaver
On 5/30/23 21:25, Lian Jiang wrote: hi, I am using psql to periodically dump the postgres tables into json files which are imported into snowflake. For large tables (e.g. 70M rows), it The command you are using is? Postgres version? takes hours for psql to complete. Using spark to read the

Re: speed up full table scan using psql

2023-05-30 Thread Lian Jiang
Thanks! Will try. Is there a way to speed up Spark ingestion (postgres -> spark -> snowflake) which does not have issues such as alignment or inefficient json file format? I favor spark since it simplifies the ingest path of postgres -> local json.gz -> s3 -> snowflake stage -> snowflake table. Ho

Re: speed up full table scan using psql

2023-05-30 Thread David Rowley
On Wed, 31 May 2023 at 16:26, Lian Jiang wrote: > I am using psql to periodically dump the postgres tables into json files > which are imported into snowflake. For large tables (e.g. 70M rows), it takes > hours for psql to complete. Using spark to read the postgres table seems not > to work as

Re: speed up full table scan using psql

2023-05-30 Thread Rob Sargent
On 5/30/23 22:25, Lian Jiang wrote: hi, I am using psql to periodically dump the postgres tables into json files which are imported into snowflake. For large tables (e.g. 70M rows), it takes hours for psql to complete. Using spark to read the postgres table seems not to work as the postgres r

speed up full table scan using psql

2023-05-30 Thread Lian Jiang
hi, I am using psql to periodically dump the postgres tables into json files which are imported into snowflake. For large tables (e.g. 70M rows), it takes hours for psql to complete. Using spark to read the postgres table seems not to work as the postgres read only replication is the bottleneck so