> We use different machines, different config, and different datasets. > ... > PostgreSQL 12.4 (Debian 12.4-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
Is It possible to upgrade and test with PG 12.7? IMHO: lot of changes: * https://www.postgresql.org/docs/12/release-12-5.html * https://www.postgresql.org/docs/12/release-12-6.html * https://www.postgresql.org/docs/12/release-12-7.html Just rule out the possibility that it has been already fixed Regards, Imre Allan Barrielle <allan.barrie...@gmail.com> ezt írta (időpont: 2021. júl. 8., Cs, 11:26): > Hello guys, I'm facing a problem. Currently I'm working on a Data > transformation Pipeline on Postgres. The strategy is, > > We select every tables in a given schema ( 50 tables ), we apply some case > when, translation, enum and load it into a different new schema with a > CREATE TABLE SCHEMA_2.table_1 AS SELECT * FROM SCHEMA_1.TABLE_1, then we do > it again about 3 more times and everytime it’s a new schema, new table. We > only keep and don’t drop the schema1. > > To orchestrate the whole, we've got a bunch of .sql files that we run by > using psql directly. That's our "strategy". > > So we're copying a lot of data, but it allows us to debug, and investigate > business bugs, because we can plug us into schema 2,3 and search why it's > an issue. > > All is fine, and can work great. > But sometimes, some queries that used to take about 20 secs to complete > can suddenly end in 5mins. > Important all queries have the same shape -> CREATE TABLE SELECT AS *(a > bit of transform) FROM TABLE). No update, nothing, it’s dead simple. > > We are just trying to copy a table from schema1, to schema2, to schema3 > and finally schema3. That’s it. > The thing to understand here is schema2, schema3 are dropped at every > pipeline transformation, so everytime we run the script, it drops > everything from schema2 to the final stage. > > We tuned the config a little bit, and we tried kind of everything ( > synchronous_commit, wal, vacuum ) > Nothing works, it’s very random, some query won’t simply work ( even after > hours ). > > We use different machines, different config, and different datasets. > > The only thing that makes it work every time, in 100% cases, is to put a > sleep(10sec) between each schema. > So we select 50 tables, we create a new schema with it, then we sleep 10 > sec then we do again the same query but with the freshly created schema and > we create a third schema, sleep 10s and again.. > > And that makes the whole pipeline successful each time. > > So, It seems it's a background process inside postgres, that should ingest > a lot of data, and we have to give him time to take a rest, like a > bg_writers or something else ? > I disabled autovacuum=off . Same. > Why does the query never end even after hours ? Why there is no log about > where the query is stuck. > To be clear, if I kill the stuck query and run again it will work. > > I don't know much about what's going on inside Postgres, which randomly > takes a lot of time, with the same code, same data. > > PostgreSQL 12.4 (Debian 12.4-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled > by gcc (Debian 8.3.0-6) 8.3.0, 64-bit > > Thank you so much for your time.. > > >