On 5/30/23 22:25, Lian Jiang wrote:
hi,
I am using psql to periodically dump the postgres tables into json
files which are imported into snowflake. For large tables (e.g. 70M
rows), it takes hours for psql to complete. Using spark to read the
postgres table seems not to work as the postgres read only replication
is the bottleneck so spark cluster never uses >1 worker node and the
working node timeout or out of memory.
Will vertical scaling the postgres db speed up psql? Or any thread
related parameter of psql can help? Thanks for any hints.
Regards
Lian
Have you looked into COPY command? Or CopyManager java class?