Re: ETL - sql orchestrator is stuck when there is not sleep() between queries

Imre Samu Thu, 08 Jul 2021 06:47:44 -0700

> We use different machines, different config, and different datasets.
> ...
>  PostgreSQL 12.4 (Debian 12.4-1.pgdg100+1) on x86_64-pc-linux-gnu,
compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit


Is It possible to upgrade and test with PG 12.7?

IMHO: lot of changes:
* https://www.postgresql.org/docs/12/release-12-5.html
* https://www.postgresql.org/docs/12/release-12-6.html
* https://www.postgresql.org/docs/12/release-12-7.html

Just rule out the possibility that it has been already fixed

Regards,
 Imre


Allan Barrielle <allan.barrie...@gmail.com> ezt írta (időpont: 2021. júl.
8., Cs, 11:26):

> Hello guys, I'm facing a problem. Currently I'm working on a Data
> transformation Pipeline on Postgres. The strategy is,
>
> We select every tables in a given schema ( 50 tables ), we apply some case
> when, translation, enum and load it into a different new schema with a
> CREATE TABLE SCHEMA_2.table_1 AS SELECT * FROM SCHEMA_1.TABLE_1, then we do
> it again about 3 more times and everytime it’s a new schema, new table. We
> only keep and don’t drop the schema1.
>
> To orchestrate the whole, we've got a bunch of .sql files that we run by
> using psql directly. That's our "strategy".
>
> So we're copying a lot of data, but it allows us to debug, and investigate
> business bugs, because we can plug us into schema 2,3 and search why it's
> an issue.
>
> All is fine, and can work great.
> But sometimes, some queries that used to take about 20 secs to complete
> can suddenly end in 5mins.
> Important all queries have the same shape -> CREATE TABLE SELECT AS *(a
> bit of transform) FROM TABLE). No update, nothing, it’s dead simple.
>
> We are just trying to copy a table from schema1, to schema2, to schema3
> and finally schema3. That’s it.
> The thing to understand here is schema2, schema3 are dropped at every
> pipeline transformation, so everytime we run the script, it drops
> everything from schema2 to the final stage.
>
> We tuned the config a little bit, and we tried kind of everything (
> synchronous_commit, wal, vacuum )
> Nothing works, it’s very random, some query won’t simply work ( even after
> hours ).
>
> We use different machines, different config, and different datasets.
>
> The only thing that makes it work every time, in 100% cases, is to put a
> sleep(10sec) between each schema.
> So we select 50 tables, we create a new schema with it, then we sleep 10
> sec then we do again the same query but with the freshly created schema and
> we create a third schema, sleep 10s and again..
>
> And that makes the whole pipeline successful each time.
>
> So, It seems it's a background process inside postgres, that should ingest
> a lot of data, and we have to give him time to take a rest, like a
> bg_writers  or something else ?
> I disabled autovacuum=off . Same.
> Why does the query never end even after hours ? Why there is no log about
> where the query is stuck.
> To be clear, if I kill the stuck query and run again it will work.
>
> I don't know much about what's going on inside Postgres, which randomly
> takes a lot of time, with the same code, same data.
>
> PostgreSQL 12.4 (Debian 12.4-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled
> by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
>
> Thank you so much for your time..
>
>
>

Re: ETL - sql orchestrator is stuck when there is not sleep() between queries

Reply via email to