Thanks! Will try. Is there a way to speed up Spark ingestion (postgres -> spark -> snowflake) which does not have issues such as alignment or inefficient json file format? I favor spark since it simplifies the ingest path of postgres -> local json.gz -> s3 -> snowflake stage -> snowflake table. Hope this makes sense.
On Tue, May 30, 2023 at 10:17 PM David Rowley <dgrowle...@gmail.com> wrote: > On Wed, 31 May 2023 at 16:26, Lian Jiang <jiangok2...@gmail.com> wrote: > > I am using psql to periodically dump the postgres tables into json files > which are imported into snowflake. For large tables (e.g. 70M rows), it > takes hours for psql to complete. Using spark to read the postgres table > seems not to work as the postgres read only replication is the bottleneck > so spark cluster never uses >1 worker node and the working node timeout or > out of memory. > > > > Will vertical scaling the postgres db speed up psql? Or any thread > related parameter of psql can help? Thanks for any hints. > > This is probably slow due to psql's alignment. It needs to read the > entire result to know how much to whitespace to pad columns with > before it can output anything. Naturally, that requires quite a bit of > memory when large tables are queried and also lots of length checking. > > As mentioned by Rob, you'll probably want to use COPY, or you could > look at using pg_dump. Something like: > > pg_dump --inserts --rows-per-insert=100 --table=<name of your table to > export> --data-only <database name> > > should give you something that you might have some luck directly > importing to Snowflake without any transformation tool required. > > However, if you do need to do some transformation before loading, then > you might want to do that in PostgreSQL. For that, something like: > > psql -c "copy (select col1,col2,col3+col4 from your_table) to stdout" > <name of database> > > would allow you to run a query, which you could maybe do your > transformations in before importing into Snowflake > > David > -- Create your own email signature <https://www.wisestamp.com/signature-in-email/?utm_source=promotion&utm_medium=signature&utm_campaign=create_your_own&srcid=5234462839406592>