PgBackRest and WAL archive expiry
List, I successfully configured pgbackrest (pgBackRest 2.52.1) on RHEL 9.4 with EPAS 16.1 for a couple of production servers and a Remote Repo Server. Seems everything is working as expected. I have a serious concern of archive dir growing day by day.. 1. In the EPAS serverI have postgres.conf with archive_command = 'pgbackrest --stanza=EMI_Repo archive-push %p && cp %p /data/archive/%f' The problem is that the /data/archive folder is growing within a few days to 850GB of 2 TB partition. What is the mechanism to check / expire the WAL archive dir automatically? How others control this behavior and on what criteria so that PITR won't be badly affected if we manually delete the WALs from the archive dir ? Does Postgres or PgBackRest have any command/directive to control the /data/archive growth after a considerable time/usage of disk space without affecting PITR (or on any condition ) ? Please shed your expertise to enlighten in this regard for a healthy WAL retention on the DB server as well as on the RepoServer Thank you, Krishane For any more inputs from DB server .. 161G./edb/as16/tablespace/ERSS 161G./edb/as16/tablespace 167G./edb/as16 167G./edb 854G./archive 229M./backup 1.1T. [root@db1 data]# cd /data/archive/ [root@db1 archive]# du -h 854G. [root@db1 archive]# [root@db1 archive]# df -h FilesystemSize Used Avail Use% Mounted on devtmpfs 4.0M 0 4.0M 0% /dev tmpfs 7.7G 11M 7.7G 1% /dev/shm tmpfs 3.1G 28M 3.1G 1% /run /dev/mapper/rhel_bc68-root 20G 6.6G 14G 33% / /dev/mapper/rhel_bc68-app 5.0G 68M 4.9G 2% /app */dev/mapper/rhel_bc68-data 2.0T 1.1T 979G 52% /data*/dev/sda2 960M 372M 589M 39% /boot /dev/sda1 599M 7.1M 592M 2% /boot/efi tmpfs 1.6G 52K 1.6G 1% /run/user/42 tmpfs 1.6G 36K 1.6G 1% /run/user/0 [root@db1 archive]# # (change requires restart) archive_mode = on # enables archiving; off, on, or always # (change requires restart) # (empty string indicates archive_command should # be used) # archive_command = 'pgbackrest --stanza=EMI_Repo archive-push %p && cp %p /data/archive/%f' # placeholders: %p = path of file to archive # %f = file name only *[root@db1 pg_wal]# du -h20K ./archive_status5.1G.[root@db1 pg_wal]#*
Re: PgBackRest and WAL archive expiry
> On Sep 19, 2024, at 22:46, KK CHN wrote: > > 1. In the EPAS serverI have postgres.conf with > archive_command = 'pgbackrest --stanza=EMI_Repo archive-push %p && cp %p > /data/archive/%f' > > The problem is that the /data/archive folder is growing within a few days > to 850GB of 2 TB partition. The /data/archive directory is entirely under your control. pgbackrest and PostgreSQL don't manage them in any way. It will just keep growing indefinitely unless you take action to delete the WAL segments out of it. There's no real benefit in maintaining that separate /data/archive directory; pgbackrest archives and manages the lifecycle of the WAL segments in its repository. I wouldn't bother with that separate archive, and just use pgbackrest.
Synchronize the dump with a logical slot with --snapshot
Hi Team --snapshot=*snapshotname* (Use the specified synchronized snapshot when making a dump of the database This option is useful when needing to synchronize the dump with a logical replication slot) as per the pgdg How do we synchronize the dump with a logical replication slot with --snapshot? I am using the postgresql 14 version which supports only pg_create_logical_replication_slot. How to generate a snapshot with it ? Below CREATE_REPLICAION_SLOT not supported by postgresql 14 example osdb_lsr=# CREATE_REPLICATION_SLOT lsr_sync_01 LOGICAL pgoutput; slot_name | consistent_point |snapshot_name| output_plugin -+--+-+--- lsr_sync_01 | 0/C000110| 0003-0002-1 | pgoutput Regards, Durga Mahesh
Re: Need assistance in converting subqueries to joins
Siraj G writes: > Please find below the query in the format and its execution plan: [ blink... ] I'm not sure what you are using there, but it is *not* Postgres. There are assorted entries in the execution plan that community Postgres has never heard of, such as > -> Remove duplicate (P0, IS_SEC_FILT) rows using temporary table > (weedout) (cost=2085.53 rows=1988) (actual time=0.321..22600.652 > rows=10298 loops=1) > -> Single-row index lookup on P0 using IS_PROJ_PK > (IS_PROJ_GUID=T0.IS_PROJ_GUID, IS_REPOSITORY_ID=R0.REP_ID) (cost=0.63 > rows=1) (actual time=0.000..0.000 rows=1 loops=50) Maybe this is RDS, or Aurora, or Greenplum, or one of many other commercial forks of Postgres? In any case you'd get more on-point advice from their support forums than from the PG community. It looks like this is a fork that has installed its own underlying table engine, meaning that what we know about performance may not be terribly relevant. regards, tom lane
Re: How batch processing works
Below are the results for the posted methods. Tested it on local and it gave no difference in timing between the method-2 andmethod-3. Failed to run in dbfiddle somehow. Also I was initially worried if adding the trigger to the our target table, will worsen the performance as because , it will make all the execution to "row by row" rather a true batch insert(method-3 as posted) as there will be more number of context switches , but it seems it will still be doing the batch commits(like the way its in method-2). So as per that , we won't lose any performance as such. Is this understanding correct? *Method-1- 00:01:44.48* *Method-2- 00:00:02.67* *Method-3- 00:00:02.39* https://gist.github.com/databasetech0073/8e9106757d751358c0c0c65a2374dbc6 On Thu, Sep 19, 2024 at 6:42 PM Lok P wrote: > > > On Thu, Sep 19, 2024 at 5:40 PM Ron Johnson > wrote: > >> On Thu, Sep 19, 2024 at 5:24 AM Lok P wrote: >> >>> >>> >>> [snip] >> >>> DO $$ >>> DECLARE >>> num_inserts INTEGER := 10; >>> batch_size INTEGER := 50; >>> start_time TIMESTAMP; >>> end_time TIMESTAMP; >>> elapsed_time INTERVAL; >>> i INTEGER; >>> BEGIN >>> -- Method 1: Individual Inserts with Commit after every Row >>> start_time := clock_timestamp(); >>> >>> FOR i IN 1..num_inserts LOOP >>> INSERT INTO parent_table VALUES (i, 'a'); >>> COMMIT; >>> END LOOP; >>> >>> end_time := clock_timestamp(); >>> elapsed_time := end_time - start_time; >>> INSERT INTO debug_log (method1, start_time, end_time, elapsed_time) >>> VALUES ('Method 1: Individual Inserts with Commit after every Row', >>> start_time, end_time, elapsed_time); >>> >>> -- Method 2: Individual Inserts with Commit after 100 Rows >>> start_time := clock_timestamp(); >>> >>> FOR i IN 1..num_inserts LOOP >>> INSERT INTO parent_table2 VALUES (i, 'a'); >>> -- Commit after every 100 rows >>> IF i % batch_size = 0 THEN >>> COMMIT; >>> END IF; >>> END LOOP; >>> >>> -- Final commit if not already committed >>>commit; >>> >>> end_time := clock_timestamp(); >>> elapsed_time := end_time - start_time; >>> INSERT INTO debug_log (method1, start_time, end_time, elapsed_time) >>> VALUES ('Method 2: Individual Inserts with Commit after 100 Rows', >>> start_time, end_time, elapsed_time); >>> >>> -- Method 3: Batch Inserts with Commit after all >>> start_time := clock_timestamp(); >>> >>> FOR i IN 1..(num_inserts / batch_size) LOOP >>> INSERT INTO parent_table3 VALUES >>> (1 + (i - 1) * batch_size, 'a'), >>> >> [snip] >> >>> (49 + (i - 1) * batch_size, 'a'), >>> (50 + (i - 1) * batch_size, 'a')); >>> COMMIT; >>> END LOOP; >>> >>> COMMIT; -- Final commit for all >>> end_time := clock_timestamp(); >>> elapsed_time := end_time - start_time; >>> INSERT INTO debug_log (method1, start_time, end_time, elapsed_time) >>> VALUES ('Method 3: Batch Inserts with Commit after All', start_time, >>> end_time, elapsed_time); >>> >>> END $$; >>> >> >> Reproduce what behavior? >> >> Anyway, plpgsql functions (including anonymous DO statements) are -- to >> Postgresql -- single statements. Thus, they'll be faster than >> individual calls.. >> >> An untrusted language like plpython3u might speed things up even more, if >> you have to read a heterogeneous external file and insert all the records >> into the db. >> > > Here if you see my script , the method-1 is doing commit after each row > insert. And method-2 is doing a batch commit i.e. commit after every "50" > row. And method-3 is doing a true batch insert i.e. combining all the 50 > values in one insert statement and submitting to the database in oneshot > and then COMMIT it, so the context switching will be a lot less. So I was > expecting Method-3 to be the fastest way to insert the rows here, but the > response time shows the same response time for Method-2 and method-3. > Method-1 is the slowest through. >
Need assistance in converting subqueries to joins
Hello Tech gents! I am sorry if I am asking the wrong question to this group, but wanted assistance in converting a query replacing subqueries with joins. Please find the query below (whose cost is very high): select em_exists_id from IS_SEC_FILT WHERE (IS_SEC_FILT_GUID) NOT IN (SELECT IS_OBJ_GUID FROM TMP_IS_SEC_FILT T0, IS_PROJ P0 WHERE T0.IS_PROJ_GUID = P0.IS_PROJ_GUID AND P0.IS_PROJ_ID = IS_SEC_FILT.IS_PROJ_ID) AND (IS_PROJ_ID) IN (SELECT IS_PROJ_ID FROM IS_PROJ P0, TMP_IS_SEC_FILT T0, EM_MD R0 WHERE T0.IS_REPOSITORY_GUID = R0.REP_GUID AND T0.IS_PROJ_GUID = P0.IS_PROJ_GUID AND P0.IS_REPOSITORY_ID = R0.REP_ID); Regards Siraj
Re: Need assistance in converting subqueries to joins
On 9/19/24 21:07, Siraj G wrote: Hello Tech gents! I am sorry if I am asking the wrong question to this group, but wanted assistance in converting a query replacing subqueries with joins. Please find the query below (whose cost is very high): Add the output of the EXPLAIN ANALYZE for the query. select em_exists_id from IS_SEC_FILT WHERE (IS_SEC_FILT_GUID) NOT IN (SELECT IS_OBJ_GUID FROM TMP_IS_SEC_FILT T0, IS_PROJ P0 WHERE T0.IS_PROJ_GUID = P0.IS_PROJ_GUID AND P0.IS_PROJ_ID = IS_SEC_FILT.IS_PROJ_ID) AND (IS_PROJ_ID) IN (SELECT IS_PROJ_ID FROM IS_PROJ P0, TMP_IS_SEC_FILT T0, EM_MD R0 WHERE T0.IS_REPOSITORY_GUID = R0.REP_GUID AND T0.IS_PROJ_GUID = P0.IS_PROJ_GUID AND P0.IS_REPOSITORY_ID = R0.REP_ID); For future reference formatting the query here: https://sqlformat.darold.net/ helps get it into a form that is easier to follow: SELECT em_exists_id FROM IS_SEC_FILT WHERE (IS_SEC_FILT_GUID) NOT IN ( SELECT IS_OBJ_GUID FROM TMP_IS_SEC_FILT T0, IS_PROJ P0 WHERE T0.IS_PROJ_GUID = P0.IS_PROJ_GUID AND P0.IS_PROJ_ID = IS_SEC_FILT.IS_PROJ_ID) AND (IS_PROJ_ID) IN ( SELECT IS_PROJ_ID FROM IS_PROJ P0, TMP_IS_SEC_FILT T0, EM_MD R0 WHERE T0.IS_REPOSITORY_GUID = R0.REP_GUID AND T0.IS_PROJ_GUID = P0.IS_PROJ_GUID AND P0.IS_REPOSITORY_ID = R0.REP_ID); Regards Siraj -- Adrian Klaver adrian.kla...@aklaver.com
Re: Need assistance in converting subqueries to joins
Hello Adrian! Please find below the query in the format and its execution plan: SELECT em_exists_idFROM IS_SEC_FILTWHERE (IS_SEC_FILT_GUID)NOT IN ( SELECT IS_OBJ_GUID FROM TMP_IS_SEC_FILT T0, IS_PROJ P0 WHERE T0.IS_PROJ_GUID = P0.IS_PROJ_GUID AND P0.IS_PROJ_ID = IS_SEC_FILT.IS_PROJ_ID)AND (IS_PROJ_ID) IN ( SELECT IS_PROJ_ID FROM IS_PROJ P0, TMP_IS_SEC_FILT T0, EM_MD R0 WHERE T0.IS_REPOSITORY_GUID = R0.REP_GUID AND T0.IS_PROJ_GUID = P0.IS_PROJ_GUID AND P0.IS_REPOSITORY_ID = R0.REP_ID); Query plan: '-> Aggregate: count(0) (cost=2284.32 rows=1988) (actual time=22602.583..22602.584 rows=1 loops=1)\n -> Remove duplicate (P0, IS_SEC_FILT) rows using temporary table (weedout) (cost=2085.53 rows=1988) (actual time=0.321..22600.652 rows=10298 loops=1)\n-> Filter: (IS_SEC_FILT.IS_SEC_FILT_GUID,(select #2) is false) (cost=2085.53 rows=1988) (actual time=0.315..22433.412 rows=514900 loops=1)\n -> Inner hash join (IS_SEC_FILT.IS_PROJ_ID = P0.IS_PROJ_ID) (cost=2085.53 rows=1988) (actual time=0.188..96.362 rows=517350 loops=1)\n -> Index scan on IS_SEC_FILT using IS_SEC_FILT_PK (cost=28.84 rows=19879) (actual time=0.019..7.386 rows=20086 loops=1)\n -> Hash\n -> Nested loop inner join (cost=8.05 rows=1) (actual time=0.064..0.132 rows=50 loops=1)\n -> Inner hash join (T0.IS_REPOSITORY_GUID = R0.REP_GUID) (cost=1.70 rows=1) (actual time=0.047..0.094 rows=50 loops=1)\n -> Filter: (T0.IS_PROJ_GUID is not null) (cost=0.38 rows=5) (actual time=0.010..0.041 rows=50 loops=1)\n -> Table scan on T0 (cost=0.38 rows=50) (actual time=0.010..0.037 rows=50 loops=1)\n -> Hash\n -> Filter: (R0.REP_ID is not null) (cost=0.45 rows=2) (actual time=0.022..0.025 rows=2 loops=1)\n -> Table scan on R0 (cost=0.45 rows=2) (actual time=0.021..0.023 rows=2 loops=1)\n -> Filter: (P0.IS_REPOSITORY_ID = R0.REP_ID) (cost=0.63 rows=1) (actual time=0.001..0.001 rows=1 loops=50)\n -> Single-row index lookup on P0 using IS_PROJ_PK (IS_PROJ_GUID=T0.IS_PROJ_GUID, IS_REPOSITORY_ID=R0.REP_ID) (cost=0.63 rows=1) (actual time=0.000..0.000 rows=1 loops=50)\n -> Select #2 (subquery in condition; dependent)\n -> Limit: 1 row(s) (cost=5.98 rows=1) (actual time=0.043..0.043 rows=0 loops=517350)\n -> Filter: (T0.IS_OBJ_GUID) (cost=5.98 rows=1) (actual time=0.043..0.043 rows=0 loops=517350)\n -> Filter: (((IS_SEC_FILT.IS_SEC_FILT_GUID) = T0.IS_OBJ_GUID) or (T0.IS_OBJ_GUID is null)) (cost=5.98 rows=1) (actual time=0.042..0.042 rows=0 loops=517350)\n -> Inner hash join (T0.IS_PROJ_GUID = P0.IS_PROJ_GUID) (cost=5.98 rows=1) (actual time=0.004..0.038 rows=50 loops=517350)\n -> Table scan on T0 (cost=0.35 rows=50) (actual time=0.001..0.022 rows=50 loops=517350)\n -> Hash\n-> Single-row index lookup on P0 using PRIMARY (IS_PROJ_ID=IS_SEC_FILT.IS_PROJ_ID) (cost=0.72 rows=1) (actual time=0.001..0.001 rows=1 loops=517350)\n' On Fri, Sep 20, 2024 at 9:49 AM Adrian Klaver wrote: > On 9/19/24 21:07, Siraj G wrote: > > Hello Tech gents! > > > > I am sorry if I am asking the wrong question to this group, but wanted > > assistance in converting a query replacing subqueries with joins. > > > > Please find the query below (whose cost is very high): > > Add the output of the EXPLAIN ANALYZE for the query. > > > > > select em_exists_id from IS_SEC_FILT WHERE (IS_SEC_FILT_GUID) NOT IN > > (SELECT IS_OBJ_GUID FROM TMP_IS_SEC_FILT T0, IS_PROJ P0 WHERE > > T0.IS_PROJ_GUID = P0.IS_PROJ_GUID AND P0.IS_PROJ_ID = > > IS_SEC_FILT.IS_PROJ_ID) AND (IS_PROJ_ID) IN (SELECT IS_PROJ_ID FROM > > IS_PROJ P0, TMP_IS_SEC_FILT T0, EM_MD R0 WHERE T0.IS_REPOSITORY_GUID = > > R0.REP_GUID AND T0.IS_PROJ_GUID = P0.IS_PROJ_GUID AND > > P0.IS_REPOSITORY_ID = R0.REP_ID); > > For future reference formatting the query here: > > https://sqlformat.darold.net/ > > helps get it into a form that is easier to follow: > > SELECT > em_exists_id > FROM > IS_SEC_FILT > WHERE (IS_SEC_FILT_GUID) > NOT IN ( > SELECT > IS_OBJ_GUID > FROM > TMP_IS_SEC_FILT T0, > IS_PROJ P0 > WHERE > T0.IS_PROJ_GUID = P0.IS_PROJ_GUID > AND P0.IS_PROJ_ID = IS_SEC_FILT.IS_PROJ_ID) > AND (IS_PROJ_ID) IN ( > SELECT > IS_PROJ_ID > FROM > IS_PROJ P0, > TMP_IS_SEC_FILT T0, > EM_MD R0 > WHERE > T0.IS_REPOSITORY_GUID = R0.REP_GUID > AND T0.IS_PROJ_GUID = P0.IS_PROJ_GUID > AND P0.IS_REPOSITORY_ID = R0.REP_ID); > > > > > > Regards > > Siraj > > -- > Adrian Klaver > adrian.kla...@aklaver.com > >
Re: How batch processing works
On Thu, Sep 19, 2024 at 5:40 PM Ron Johnson wrote: > On Thu, Sep 19, 2024 at 5:24 AM Lok P wrote: > >> >> >> [snip] > >> DO $$ >> DECLARE >> num_inserts INTEGER := 10; >> batch_size INTEGER := 50; >> start_time TIMESTAMP; >> end_time TIMESTAMP; >> elapsed_time INTERVAL; >> i INTEGER; >> BEGIN >> -- Method 1: Individual Inserts with Commit after every Row >> start_time := clock_timestamp(); >> >> FOR i IN 1..num_inserts LOOP >> INSERT INTO parent_table VALUES (i, 'a'); >> COMMIT; >> END LOOP; >> >> end_time := clock_timestamp(); >> elapsed_time := end_time - start_time; >> INSERT INTO debug_log (method1, start_time, end_time, elapsed_time) >> VALUES ('Method 1: Individual Inserts with Commit after every Row', >> start_time, end_time, elapsed_time); >> >> -- Method 2: Individual Inserts with Commit after 100 Rows >> start_time := clock_timestamp(); >> >> FOR i IN 1..num_inserts LOOP >> INSERT INTO parent_table2 VALUES (i, 'a'); >> -- Commit after every 100 rows >> IF i % batch_size = 0 THEN >> COMMIT; >> END IF; >> END LOOP; >> >> -- Final commit if not already committed >>commit; >> >> end_time := clock_timestamp(); >> elapsed_time := end_time - start_time; >> INSERT INTO debug_log (method1, start_time, end_time, elapsed_time) >> VALUES ('Method 2: Individual Inserts with Commit after 100 Rows', >> start_time, end_time, elapsed_time); >> >> -- Method 3: Batch Inserts with Commit after all >> start_time := clock_timestamp(); >> >> FOR i IN 1..(num_inserts / batch_size) LOOP >> INSERT INTO parent_table3 VALUES >> (1 + (i - 1) * batch_size, 'a'), >> > [snip] > >> (49 + (i - 1) * batch_size, 'a'), >> (50 + (i - 1) * batch_size, 'a')); >> COMMIT; >> END LOOP; >> >> COMMIT; -- Final commit for all >> end_time := clock_timestamp(); >> elapsed_time := end_time - start_time; >> INSERT INTO debug_log (method1, start_time, end_time, elapsed_time) >> VALUES ('Method 3: Batch Inserts with Commit after All', start_time, >> end_time, elapsed_time); >> >> END $$; >> > > Reproduce what behavior? > > Anyway, plpgsql functions (including anonymous DO statements) are -- to > Postgresql -- single statements. Thus, they'll be faster than > individual calls.. > > An untrusted language like plpython3u might speed things up even more, if > you have to read a heterogeneous external file and insert all the records > into the db. > Here if you see my script , the method-1 is doing commit after each row insert. And method-2 is doing a batch commit i.e. commit after every "50" row. And method-3 is doing a true batch insert i.e. combining all the 50 values in one insert statement and submitting to the database in oneshot and then COMMIT it, so the context switching will be a lot less. So I was expecting Method-3 to be the fastest way to insert the rows here, but the response time shows the same response time for Method-2 and method-3. Method-1 is the slowest through.
Re: How batch processing works
On Thu, Sep 19, 2024 at 11:31 AM Ron Johnson wrote: > > [snip] > >> >> Method-4 >> >> INSERT INTO parent_table VALUES (1, 'a'), (2, 'a'); >> INSERT INTO child_table VALUES (1,1, 'a'), (1,2, 'a'); >> commit; >> > > If I knew that I had to load a structured input data file (even if it had > parent and child records), this is how I'd do it (but probably first try > and see if "in-memory COPY INTO" is such a thing). > > I was trying to reproduce this behaviour using row by row commit vs just batch commit vs true batch insert as you mentioned, i am not able to see any difference between "batch commit" and "true batch insert" response. Am I missing anything? CREATE TABLE debug_log ( method1 TEXT, start_time TIMESTAMP, end_time TIMESTAMP, elapsed_time INTERVAL ); CREATE TABLE parent_table ( id SERIAL PRIMARY KEY, name TEXT ); CREATE TABLE parent_table2 ( id SERIAL PRIMARY KEY, name TEXT ); CREATE TABLE parent_table3 ( id SERIAL PRIMARY KEY, name TEXT ); DO $$ DECLARE num_inserts INTEGER := 10; batch_size INTEGER := 50; start_time TIMESTAMP; end_time TIMESTAMP; elapsed_time INTERVAL; i INTEGER; BEGIN -- Method 1: Individual Inserts with Commit after every Row start_time := clock_timestamp(); FOR i IN 1..num_inserts LOOP INSERT INTO parent_table VALUES (i, 'a'); COMMIT; END LOOP; end_time := clock_timestamp(); elapsed_time := end_time - start_time; INSERT INTO debug_log (method1, start_time, end_time, elapsed_time) VALUES ('Method 1: Individual Inserts with Commit after every Row', start_time, end_time, elapsed_time); -- Method 2: Individual Inserts with Commit after 100 Rows start_time := clock_timestamp(); FOR i IN 1..num_inserts LOOP INSERT INTO parent_table2 VALUES (i, 'a'); -- Commit after every 100 rows IF i % batch_size = 0 THEN COMMIT; END IF; END LOOP; -- Final commit if not already committed commit; end_time := clock_timestamp(); elapsed_time := end_time - start_time; INSERT INTO debug_log (method1, start_time, end_time, elapsed_time) VALUES ('Method 2: Individual Inserts with Commit after 100 Rows', start_time, end_time, elapsed_time); -- Method 3: Batch Inserts with Commit after all start_time := clock_timestamp(); FOR i IN 1..(num_inserts / batch_size) LOOP INSERT INTO parent_table3 VALUES (1 + (i - 1) * batch_size, 'a'), (2 + (i - 1) * batch_size, 'a'), (3 + (i - 1) * batch_size, 'a'), (4 + (i - 1) * batch_size, 'a'), (5 + (i - 1) * batch_size, 'a'), (6 + (i - 1) * batch_size, 'a'), (7 + (i - 1) * batch_size, 'a'), (8 + (i - 1) * batch_size, 'a'), (9 + (i - 1) * batch_size, 'a'), (10 + (i - 1) * batch_size, 'a'), (11 + (i - 1) * batch_size, 'a'), (12 + (i - 1) * batch_size, 'a'), (13 + (i - 1) * batch_size, 'a'), (14 + (i - 1) * batch_size, 'a'), (15 + (i - 1) * batch_size, 'a'), (16 + (i - 1) * batch_size, 'a'), (17 + (i - 1) * batch_size, 'a'), (18 + (i - 1) * batch_size, 'a'), (19 + (i - 1) * batch_size, 'a'), (20 + (i - 1) * batch_size, 'a'), (21 + (i - 1) * batch_size, 'a'), (22 + (i - 1) * batch_size, 'a'), (23 + (i - 1) * batch_size, 'a'), (24 + (i - 1) * batch_size, 'a'), (25 + (i - 1) * batch_size, 'a'), (26 + (i - 1) * batch_size, 'a'), (27 + (i - 1) * batch_size, 'a'), (28 + (i - 1) * batch_size, 'a'), (29 + (i - 1) * batch_size, 'a'), (30 + (i - 1) * batch_size, 'a'), (31 + (i - 1) * batch_size, 'a'), (32 + (i - 1) * batch_size, 'a'), (33 + (i - 1) * batch_size, 'a'), (34 + (i - 1) * batch_size, 'a'), (35 + (i - 1) * batch_size, 'a'), (36 + (i - 1) * batch_size, 'a'), (37 + (i - 1) * batch_size, 'a'), (38 + (i - 1) * batch_size, 'a'), (39 + (i - 1) * batch_size, 'a'), (40 + (i - 1) * batch_size, 'a'), (41 + (i - 1) * batch_size, 'a'), (42 + (i - 1) * batch_size, 'a'), (43 + (i - 1) * batch_size, 'a'), (44 + (i - 1) * batch_size, 'a'), (45 + (i - 1) * batch_size, 'a'), (46 + (i - 1) * batch_size, 'a'), (47 + (i - 1) * batch_size, 'a'), (48 + (i - 1) * batch_size, 'a'), (49 + (i - 1) * batch_size, 'a'), (50 + (i - 1) * batch_size, 'a')); COMMIT; END LOOP; COMMIT; -- Final commit for all end_time := clock_timestamp(); elapsed_time := end_time - start_time; INSERT INTO debug_log (method1, start_time, end_time, elapsed_time) VALUES ('Method 3: Ba
Re: IO related waits
On 9/19/24 05:24, Greg Sabino Mullane wrote: On Thu, Sep 19, 2024 at 5:17 AM veem v This is really difficult to diagnose from afar with only snippets of logs and half-complete descriptions of your business logic. Pull everyone involved into a room with a whiteboard, and produce a document describing exactly what your application does, and how it is doing it. Switch from reactive to proactive. +1 Cheers, Greg -- Adrian Klaver adrian.kla...@aklaver.com
Re: glibc updarte 2.31 to 2.38
Paul Foerster writes: > we have SLES 15.5 which has glibc 2.31. Our admin told us that he's about to > install the SLES 15.6 update which contains glibc 2.38. > I have built our PostgreSQL software from source on SLES 15.5, because we > have some special requirements which the packages cannot fulfill. So I have > questions: > 1) Do I have to build it again on 15.6? No, I wouldn't expect that to be necessary. > 2) Does the glibc update have any impact? Maybe. We don't really track glibc changes, so I can't say for sure, but it might be advisable to reindex indexes on string columns. regards, tom lane
Re: glibc updarte 2.31 to 2.38
On 9/19/24 11:14, Tom Lane wrote: Paul Foerster writes: we have SLES 15.5 which has glibc 2.31. Our admin told us that he's about to install the SLES 15.6 update which contains glibc 2.38. 2) Does the glibc update have any impact? Maybe. We don't really track glibc changes, so I can't say for sure, but it might be advisable to reindex indexes on string columns. Every glibc major version change potentially impacts the sorting of some strings, which would require reindexing. Whether your actual data trips into any of these changes is another matter. You could check by doing something equivalent to this on every collatable column with an index built on it, in every table: 8<--- WITH t(s) AS (SELECT FROM ORDER BY 1) SELECT md5(string_agg(t.s, NULL)) FROM t; 8<--- Check the before and after glibc upgrade result -- if it is the same, you are good to go. If not, rebuild the index before *any* DML is done to the table. -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: glibc updarte 2.31 to 2.38
On 9/19/24 13:07, Joe Conway wrote: On 9/19/24 11:14, Tom Lane wrote: Paul Foerster writes: we have SLES 15.5 which has glibc 2.31. Our admin told us that he's about to install the SLES 15.6 update which contains glibc 2.38. 2) Does the glibc update have any impact? Maybe. We don't really track glibc changes, so I can't say for sure, but it might be advisable to reindex indexes on string columns. Every glibc major version change potentially impacts the sorting of some strings, which would require reindexing. Whether your actual data trips into any of these changes is another matter. You could check by doing something equivalent to this on every collatable column with an index built on it, in every table: 8<--- WITH t(s) AS (SELECT FROM ORDER BY 1) SELECT md5(string_agg(t.s, NULL)) FROM t; 8<--- Check the before and after glibc upgrade result -- if it is the same, you are good to go. If not, rebuild the index before *any* DML is done to the table. ... and I should have mentioned that in a similar way, if you have any tables that are partitioned by range on collatable columns, the partition boundaries potentially are affected. Similarly, constraints involving expressions on collatable columns may be affected. -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: glibc updarte 2.31 to 2.38
On 2024-09-19 16:37:41 +0200, Paul Foerster wrote: > we have SLES 15.5 which has glibc 2.31. Our admin told us that he's > about to install the SLES 15.6 update which contains glibc 2.38. > > I have built our PostgreSQL software from source on SLES 15.5, because > we have some special requirements which the packages cannot fulfill. > So I have questions: > > 1) Do I have to build it again on 15.6? > > 2) Does the glibc update have any impact? I recall having to have > everything reindexed when the 2.28 update came due to major locale > changes, but I didn't have to do it since then. > > 3) Where and how can I find out if it is necessary to reindex? I wrote a small script[1] which prints all unicode code points and a few selected[2] longer strings in order. If you run that before and after the upgrade and the output doesn't change, you are probably be fine. (It checks only the default collation, though: If you have indexes using a different collation you would have to modify the script accordingly.) If there are differences, closer inspection might show that the changes don't affect you. But I would reindex all indexes on text (etc.) columns just to be sure. hp [1] https://git.hjp.at:3000/hjp/pgcollate [2] The selection is highly subjective and totally unscientific. Additions are welcome. -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: glibc updarte 2.31 to 2.38
Hi Adrian, > On 19 Sep 2024, at 17:00, Adrian Klaver wrote: > > I would take a look at: > > https://wiki.postgresql.org/wiki/Locale_data_changes > > It refers to the glibc 2.8 change in particular, but includes some generic > tips that could prove useful. > > > The glibc change log below might also be useful: > > https://sourceware.org/glibc/wiki/Release I've seen those before but since the article only refers to 2.28 and SUSE 15.3, and I couldn't find anything in the glibc release notes, I thought I'd ask. Cheers, Paul
Re: glibc updarte 2.31 to 2.38
Hi Tom, > On 19 Sep 2024, at 17:14, Tom Lane wrote: > > No, I wouldn't expect that to be necessary. I was hoping one of the pros would say that. 🤣 > Maybe. We don't really track glibc changes, so I can't say for sure, > but it might be advisable to reindex indexes on string columns. Advisable is a word I undfortunately can't do much with. We have terabytes and terabytes of data in hundreds of databases each having potentially hundreds of columns that are candidates. Just reindexing and taking down applications during that time is not an option in a 24x7 high availability environment. Cheer, Paul
glibc updarte 2.31 to 2.38
Hi, we have SLES 15.5 which has glibc 2.31. Our admin told us that he's about to install the SLES 15.6 update which contains glibc 2.38. I have built our PostgreSQL software from source on SLES 15.5, because we have some special requirements which the packages cannot fulfill. So I have questions: 1) Do I have to build it again on 15.6? 2) Does the glibc update have any impact? I recall having to have everything reindexed when the 2.28 update came due to major locale changes, but I didn't have to do it since then. 3) Where and how can I find out if it is necessary to reindex? And how can I find out what indexes would be affected. I'd really appreciate your comments. Thanks very much in advance. Paul
Re: glibc updarte 2.31 to 2.38
On 9/19/24 07:37, Paul Foerster wrote: Hi, we have SLES 15.5 which has glibc 2.31. Our admin told us that he's about to install the SLES 15.6 update which contains glibc 2.38. I have built our PostgreSQL software from source on SLES 15.5, because we have some special requirements which the packages cannot fulfill. So I have questions: 1) Do I have to build it again on 15.6? 2) Does the glibc update have any impact? I recall having to have everything reindexed when the 2.28 update came due to major locale changes, but I didn't have to do it since then. 3) Where and how can I find out if it is necessary to reindex? And how can I find out what indexes would be affected. I'd really appreciate your comments. Thanks very much in advance. I would take a look at: https://wiki.postgresql.org/wiki/Locale_data_changes It refers to the glibc 2.8 change in particular, but includes some generic tips that could prove useful. The glibc change log below might also be useful: https://sourceware.org/glibc/wiki/Release Paul -- Adrian Klaver adrian.kla...@aklaver.com
Re: IO related waits
On Thu, 19 Sept 2024 at 03:02, Adrian Klaver wrote: > > > This needs clarification. > > 1) To be clear when you refer to parent and child that is: > FK > parent_tbl.fld <--> child_tbl.fld_fk > > not parent and child tables in partitioning scheme? > > 2) What are the table schemas? > > 3) What is the code that is generating the error? > > > Overall it looks like this process needs a top to bottom audit to map > out what is actually being done versus what needs to be done. > > > Yes the tables were actually having parent and child table relationships, not the child/parent table in partitioning scheme. And the PK and FK are on columns - (ID, part_date) .The table is the daily range partitioned on column part_date. *The error we are seeing is as below in logs:-* deadlock detected 2024-09-18 17:05:56 UTC:100.72.10.66(54582):USER1@TRANDB:[14537]:DETAIL: Process 14537 waits for ShareLock on transaction 220975629; blocked by process 14548. Process 14548 waits for ShareLock on transaction 220975630; blocked by process 14537. Process 14537: INSERT INTO TRANDB.PART_TAB (ID, part_date) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14) ON CONFLICT (ID, part_date) DO NOTHING Process 14548: INSERT INTO TRANDB.PART_TAB (ID, part_date, ...) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14) ON CONFLICT (ID, part_date) DO NOTHING 2024-09-18 17:05:56 UTC:100.72.22.33(54582):USER1@TRANDB:[14537]:HINT: See server log for query details. 2024-09-18 17:05:56 UTC:100.72.22.33(54582):USER1@TRANDB:[14537]:CONTEXT: while inserting index tuple (88814,39) in relation "PART_TAB_p2024_08_29" 2024-09-18 17:05:56 UTC:100.72.22.33(54582):USER1@TRANDB:[14537]:STATEMENT: INSERT INTO TRANDB.PART_TAB (ID, part_date, ...) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14) ON CONFLICT (ID, part_date) DO NOTHING 2024-09-18 17:05:56 UTC:100.72.22.33(54582):USER1@TRANDB:[14537]:ERROR: current transaction is aborted, commands ignored until end of transaction block 2024-09-18 17:05:56 UTC:100.72.22.33(54582):USER1@TRANDB:[14537]:STATEMENT: INSERT INTO TRANDB.EXCEP_TAB (...) 2024-09-18 17:05:56 UTC:100.72.22.33(54582):USER1@TRANDB:[14537]:ERROR: current transaction is aborted, commands ignored until end of transaction block 2024-09-18 17:05:56 UTC:100.72.22.33(54582):USER1@TRANDB:[14537]:STATEMENT: 2024-09-18 17:05:56 UTC:100.72.22.33(36096):USER1@TRANDB:[14551]:ERROR: current transaction is aborted, commands ignored until end of transaction block * 2024-09-18 17:41:05 UTC:100.72.39.47(58778):USER1@TRANDB:[17456]:ERROR: deadlock detected 2024-09-18 17:41:05 UTC:100.72.39.47(58778):USER1@TRANDB:[17456]:DETAIL: Process 17456 waits for ShareLock on transaction 220978890; blocked by process 17458. Process 17458 waits for ShareLock on transaction 220978889; blocked by process 17456. Process 17456: INSERT INTO TRANDB.PART_TAB (ID, part_date, ...) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14) ON CONFLICT (ID, part_date) DO NOTHING Process 17458: INSERT INTO TRANDB.PART_TAB (ID, part_date, ..) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14) ON CONFLICT (ID, part_date) DO NOTHING 2024-09-18 17:41:05 UTC:100.72.39.47(58778):USER1@TRANDB:[17456]:HINT: See server log for query details. 2024-09-18 17:41:05 UTC:100.72.39.47(58778):USER1@TRANDB:[17456]:CONTEXT: while inserting index tuple (88875,13) in relation "PART_TAB_p2024_08_29" 2024-09-18 17:41:05 UTC:100.72.39.47(58778):USER1@TRANDB:[17456]:STATEMENT: INSERT INTO TRANDB.PART_TAB (ID, part_date,..) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14) ON CONFLICT (ID, part_date) DO NOTHING 2024-09-18 17:41:05 UTC:100.72.39.47(58778):USER1@TRANDB:[17456]:LOG: could not receive data from client: Connection reset by peer 2024-09-18 17:41:05 UTC:100.72.39.47(58778):USER1@TRANDB:[17456]:LOG: disconnection: session time: 0:08:37.154 user=USER1 database=TRANDB host=X port=58778
Re: IO related waits
On Thu, Sep 19, 2024 at 5:17 AM veem v wrote: > 2024-09-18 17:05:56 UTC:100.72.10.66(54582):USER1@TRANDB:[14537]:DETAIL: > Process 14537 waits for ShareLock on transaction 220975629; blocked by > process 14548. > You need to find out exactly what commands, and in what order, all these processes are doing. Deadlocks can be avoided by rearranging your application logic. > 2024-09-18 17:05:56 UTC:100.72.22.33(54582):USER1@TRANDB:[14537]:ERROR: > current transaction is aborted, commands ignored until end of transaction > block > 2024-09-18 17:05:56 UTC:100.72.22.33(54582):USER1@TRANDB:[14537]:STATEMENT: > INSERT INTO TRANDB.EXCEP_TAB (...) > 2024-09-18 17:05:56 UTC:100.72.22.33(54582):USER1@TRANDB:[14537]:ERROR: > current transaction is aborted, commands ignored until end of transaction > block > 2024-09-18 17:05:56 UTC:100.72.22.33(54582):USER1@TRANDB:[14537]:STATEMENT: > > 2024-09-18 17:05:56 UTC:100.72.22.33(36096):USER1@TRANDB:[14551]:ERROR: > current transaction is aborted, commands ignored until end of transaction > block > Fix your application. It should be checking that each command completed and not just blindly pushing on to the next statement while ignoring the error. This is really difficult to diagnose from afar with only snippets of logs and half-complete descriptions of your business logic. Pull everyone involved into a room with a whiteboard, and produce a document describing exactly what your application does, and how it is doing it. Switch from reactive to proactive. Cheers, Greg
Re: How batch processing works
On Thu, Sep 19, 2024 at 5:24 AM Lok P wrote: > > On Thu, Sep 19, 2024 at 11:31 AM Ron Johnson > wrote: > >> >> [snip] >> >>> >>> Method-4 >>> >>> INSERT INTO parent_table VALUES (1, 'a'), (2, 'a'); >>> INSERT INTO child_table VALUES (1,1, 'a'), (1,2, 'a'); >>> commit; >>> >> >> If I knew that I had to load a structured input data file (even if it had >> parent and child records), this is how I'd do it (but probably first try >> and see if "in-memory COPY INTO" is such a thing). >> >> > > I was trying to reproduce this behaviour using row by row commit vs just > batch commit vs true batch insert as you mentioned, i am not able to see > any difference between "batch commit" and "true batch insert" response. Am > I missing anything? > > [snip] > DO $$ > DECLARE > num_inserts INTEGER := 10; > batch_size INTEGER := 50; > start_time TIMESTAMP; > end_time TIMESTAMP; > elapsed_time INTERVAL; > i INTEGER; > BEGIN > -- Method 1: Individual Inserts with Commit after every Row > start_time := clock_timestamp(); > > FOR i IN 1..num_inserts LOOP > INSERT INTO parent_table VALUES (i, 'a'); > COMMIT; > END LOOP; > > end_time := clock_timestamp(); > elapsed_time := end_time - start_time; > INSERT INTO debug_log (method1, start_time, end_time, elapsed_time) > VALUES ('Method 1: Individual Inserts with Commit after every Row', > start_time, end_time, elapsed_time); > > -- Method 2: Individual Inserts with Commit after 100 Rows > start_time := clock_timestamp(); > > FOR i IN 1..num_inserts LOOP > INSERT INTO parent_table2 VALUES (i, 'a'); > -- Commit after every 100 rows > IF i % batch_size = 0 THEN > COMMIT; > END IF; > END LOOP; > > -- Final commit if not already committed >commit; > > end_time := clock_timestamp(); > elapsed_time := end_time - start_time; > INSERT INTO debug_log (method1, start_time, end_time, elapsed_time) > VALUES ('Method 2: Individual Inserts with Commit after 100 Rows', > start_time, end_time, elapsed_time); > > -- Method 3: Batch Inserts with Commit after all > start_time := clock_timestamp(); > > FOR i IN 1..(num_inserts / batch_size) LOOP > INSERT INTO parent_table3 VALUES > (1 + (i - 1) * batch_size, 'a'), > [snip] > (49 + (i - 1) * batch_size, 'a'), > (50 + (i - 1) * batch_size, 'a')); > COMMIT; > END LOOP; > > COMMIT; -- Final commit for all > end_time := clock_timestamp(); > elapsed_time := end_time - start_time; > INSERT INTO debug_log (method1, start_time, end_time, elapsed_time) > VALUES ('Method 3: Batch Inserts with Commit after All', start_time, > end_time, elapsed_time); > > END $$; > Reproduce what behavior? Anyway, plpgsql functions (including anonymous DO statements) are -- to Postgresql -- single statements. Thus, they'll be faster than individual calls.. An untrusted language like plpython3u might speed things up even more, if you have to read a heterogeneous external file and insert all the records into the db. -- Death to , and butter sauce. Don't boil me, I'm still alive. crustacean!
Re: IO related waits
On Thu, 19 Sept 2024 at 17:54, Greg Sabino Mullane wrote: > On Thu, Sep 19, 2024 at 5:17 AM veem v wrote: > >> 2024-09-18 17:05:56 UTC:100.72.10.66(54582):USER1@TRANDB:[14537]:DETAIL: >> Process 14537 waits for ShareLock on transaction 220975629; blocked by >> process 14548. >> > > You need to find out exactly what commands, and in what order, all these > processes are doing. Deadlocks can be avoided by rearranging your > application logic. > > >> 2024-09-18 17:05:56 UTC:100.72.22.33(54582):USER1@TRANDB:[14537]:ERROR: >> current transaction is aborted, commands ignored until end of transaction >> block >> 2024-09-18 17:05:56 UTC:100.72.22.33(54582):USER1@TRANDB:[14537]:STATEMENT: >> INSERT INTO TRANDB.EXCEP_TAB (...) >> 2024-09-18 17:05:56 UTC:100.72.22.33(54582):USER1@TRANDB:[14537]:ERROR: >> current transaction is aborted, commands ignored until end of transaction >> block >> 2024-09-18 17:05:56 UTC:100.72.22.33(54582):USER1@TRANDB:[14537]:STATEMENT: >> >> 2024-09-18 17:05:56 UTC:100.72.22.33(36096):USER1@TRANDB:[14551]:ERROR: >> current transaction is aborted, commands ignored until end of transaction >> block >> > > Fix your application. It should be checking that each command completed > and not just blindly pushing on to the next statement while ignoring the > error. > > This is really difficult to diagnose from afar with only snippets of logs > and half-complete descriptions of your business logic. Pull everyone > involved into a room with a whiteboard, and produce a document describing > exactly what your application does, and how it is doing it. Switch from > reactive to proactive. > > > Thank you Greg. I was thinking there might be some oddity or difference in the behaviour here in postgres as compared to others, because I have seen deadlock due to UPDATES but never seen deadlocks with INSERT queries before in other databases (though here we have "insert on conflict do nothing"). But I am now thinking , here we have foreign keys and primary keys exist and if the same PK gets inserted from multiple sessions then one will wait if the other has not been committed and that might be creating a situation of locking first and subsequently deadlock. But also we are doing batch inserts from multiple sessions but each session will first insert into the parent and then into the child table for those related to PK and FK and it should not overlap across sessions. So I will check if there is a loophole there. Also another thing which we encountered here , if the session gets errors out with any error(may be deadlock etc) , it's not executing any further transactions and erroring out with "*current transaction aborted, command ignored until end of transaction block*". And it seems it will need explicit "rollback" and will not be the default rollback, which I was expecting it to do. Regards Veem
Re: glibc updarte 2.31 to 2.38
Hi Joe, > On 19 Sep 2024, at 19:07, Joe Conway wrote: > > Every glibc major version change potentially impacts the sorting of some > strings, which would require reindexing. Whether your actual data trips into > any of these changes is another matter. > > You could check by doing something equivalent to this on every collatable > column with an index built on it, in every table: > > 8<--- > WITH t(s) AS (SELECT FROM ORDER BY 1) > SELECT md5(string_agg(t.s, NULL)) FROM t; > 8<--- > > Check the before and after glibc upgrade result -- if it is the same, you are > good to go. If not, rebuild the index before *any* DML is done to the table. I like the neatness of this one. I think about how to implement this on hundreds of of databases with hundreds of columns. That'll be a challenge, but at least it's a start. Thanks very much for this one. Cheers, Paul
Re: glibc updarte 2.31 to 2.38
Hi Peter, > On 19 Sep 2024, at 19:43, Peter J. Holzer wrote: > > I wrote a small script[1] which prints all unicode code points and a few > selected[2] longer strings in order. If you run that before and after > the upgrade and the output doesn't change, you are probably be fine. > (It checks only the default collation, though: If you have indexes using > a different collation you would have to modify the script accordingly.) > > If there are differences, closer inspection might show that the changes > don't affect you. But I would reindex all indexes on text (etc.) columns > just to be sure. > >hp > > [1] https://git.hjp.at:3000/hjp/pgcollate > [2] The selection is highly subjective and totally unscientific. >Additions are welcome. I'm not a Python specialist but I take it that the script need psycopg2, which we probably don't have. So I'd have to build some sort of venv around that like I had to do to get Patroni working on our systems. Well, we'll see. Thanks for this script. Cheers, Paul
Re: glibc updarte 2.31 to 2.38
On 9/19/24 13:56, Paul Foerster wrote: On 19 Sep 2024, at 17:14, Tom Lane wrote: Maybe. We don't really track glibc changes, so I can't say for sure, but it might be advisable to reindex indexes on string columns. Advisable is a word I undfortunately can't do much with. We have terabytes and terabytes of data in hundreds of databases each having potentially hundreds of columns that are candidates. Just reindexing and taking down applications during that time is not an option in a 24x7 high availability environment. See my thread-adjacent email, but suffice to say that if there are collation differences that do affect your tables/data, and you allow any inserts or updates, you may wind up with corrupted data (e.g. duplicate data in your otherwise unique indexes/primary keys). For more examples about that see https://joeconway.com/presentations/glibc-SCaLE21x-2024.pdf An potential alternative for you (discussed at the end of that presentation) would be to create a new branch based on your original SLES 15.5 glibc RPM equivalent to this: https://github.com/awslabs/compat-collation-for-glibc/tree/2.17-326.el7 The is likely a non trivial amount of work involved (the port from the AL2 rpm to the RHEL7 rpm took me the better part of a couple of days), but once done your collation is frozen to the specific version you had on 15.5. -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: glibc updarte 2.31 to 2.38
Hi Joe, > On 19 Sep 2024, at 20:09, Joe Conway wrote: > > See my thread-adjacent email, but suffice to say that if there are collation > differences that do affect your tables/data, and you allow any inserts or > updates, you may wind up with corrupted data (e.g. duplicate data in your > otherwise unique indexes/primary keys). Yes, I know that. > For more examples about that see > https://joeconway.com/presentations/glibc-SCaLE21x-2024.pdf A very interesting PDF. Thanks very much. > An potential alternative for you (discussed at the end of that presentation) > would be to create a new branch based on your original SLES 15.5 glibc RPM > equivalent to this: > > https://github.com/awslabs/compat-collation-for-glibc/tree/2.17-326.el7 > > The is likely a non trivial amount of work involved (the port from the AL2 > rpm to the RHEL7 rpm took me the better part of a couple of days), but once > done your collation is frozen to the specific version you had on 15.5. I'm not a developer. I have one machine which is equivalent to all other servers except that it has gcc, make and some other things for me to build PostgreSQL. I can't make the admins run a rpm on all servers. I can obviously put a library into the /path/2/postgres/software/lib64 directory but not into the system. Also, my build server does not have internet access. So things like git clone would be an additional show stopper. Unfortunately, I'm pretty limited. Cheers, Paul