Re: [Feature Request] INSERT FROZEN to Optimize Large Cold Data Imports and Migrations
Thank you for your answer TOM. I'd like to have this option only for initial loading of huge amounts of data, where atomicity consistency is needed at all. Maybe an option on startup command just for initial import mode, would be nice. I had huge problems on server 3 weeks after a 6 TB migration from other DB. I think it's sad to rewrite all data twice. Le jeu. 13 févr. 2025 à 16:08, Tom Lane a écrit : > =?UTF-8?Q?S=C3=A9bastien?= writes: > > Implementation details: > > >- A new INSERT FROZEN option could be introduced, similar to COPY > FREEZE, > >allowing direct insertion of tuples in a frozen state. > >- This would likely require changes in heap storage logic to ensure > >tuples are written with a frozen XID at insert time. > >- Consideration should be given to transaction semantics and WAL > logging > >to ensure consistency and crash recovery integrity. > > That last is exactly why this won't happen. A frozen tuple would be > considered committed and visible the instant it appears in the table, > thus completely breaking both atomicity and integrity of the > transaction. > > There has been work going on recently to reduce the impact of freezing > massive amounts of data by spreading the work more effectively [1]. > I don't say that that particular commit has completely solved the > problem, but I think that continued effort in that direction is more > likely to yield usable results than what you're suggesting. > > BTW, this might or might not be usable in your particular workflow, > but: there have long been some optimizations for data load into a > table created in the same transaction. The idea there is that if the > transaction rolls back, the table will never have been visible to any > other transaction at all, so that maintaining atomicity/integrity of > its contents is moot. > > regards, tom lane > > [1] > https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=052026c9b > -- Sébastien Caunes +33 6 7 229 229 7
Fwd: [Feature Request] Per-Database Transaction Logs for Enhanced Isolation and New Capabilities
Introduce per-database transaction logs (WAL) and transaction ID spaces to improve database isolation, enable hot-mounting/unmounting, selective replication, and open new possibilities in PostgreSQL. Business Use-case: With modern SSDs offering high throughput and low latency, maintaining a single *global* transaction log across all databases in a PostgreSQL instance is becoming an unnecessary constraint. By allowing *each database to have its own transaction log and transaction ID space*, PostgreSQL could achieve significant improvements in performance, isolation, and flexibility. *Key Benefits*: - *Better isolation between databases*: - A long-running transaction in one database would no longer prevent vacuuming of tables in another. - No risk of transaction wraparound issues in one database affecting others. - *Hot-mounting/unmounting databases*: - Ability to attach/detach databases dynamically at the filesystem level without impacting the rest of the cluster. - Faster database restores and migrations by simply copying database files and starting the instance. - *Selective replication*: - Currently, logical replication can be done at the table level, but physical replication applies to the entire cluster. - With per-database WAL, it would be possible to *replicate only specific databases* without requiring complex logical replication setups. - *More flexible backup & restore*: - Ability to back up and restore *individual databases* with transaction consistency, instead of full-cluster backups. - Faster recovery and better disaster recovery options. - *Better integration with cloud and containerized environments*: Would enable dynamically adding and removing databases in cloud environments without cluster-wide restarts. User impact with the change: - Users with large multi-database clusters would see *better transaction isolation*, fewer maintenance conflicts, and *more flexible database management*. - Organizations running *multi-tenant* environments or *per-database replication* setups would gain *easier and more efficient ways to manage databases*. - PostgreSQL would become much more *modular and cloud-friendly*, aligning it with modern high-availability and container-based deployments. Implementation details: - Requires modifying PostgreSQL's WAL and transaction system to support per-database transaction logs. - WAL archiving, replication, and recovery logic would need adjustments to support per-database operations. - Needs careful handling of catalog metadata (such as pg_database) to ensure atomicity when attaching/detaching databases. Estimated Development Time: I do not know PostgreSQL's internal architecture well enough to assess the full impact of such a change. However, taking a step back, it seems that rather than deeply modifying the core engine, an alternative approach could be to spawn a separate PostgreSQL engine per database. In this case, the main entry point would act more like a connection bouncer, routing requests to individual database engines. Opportunity Window Period: As SSD and cloud-based infrastructures become the norm, this change would provide *major competitive advantages* for PostgreSQL in multi-tenant, high-performance, and cloud-native use cases. Budget Money: ... Contact Information: Sebastien Caunes sebast...@pixseed.fr
[Feature Request] INSERT FROZEN to Optimize Large Cold Data Imports and Migrations
One-line Summary: Introduce an INSERT FROZEN feature to bypass vacuum processing for large-scale cold data imports, reducing the impact on system performance post-import. For large imports, migrations and major version upgrades. Business Use-case: When importing massive datasets (e.g., 6-10TB) into PostgreSQL on a heavily loaded server, we observed that the system struggled significantly weeks later when the autovacuum process had to freeze all the imported data pages. This led to severe performance degradation, requiring manual intervention to prioritize vacuum jobs to complete them as quickly as possible. This issue is particularly critical during database *migrations* or *version upgrades*, where a full data reload is often necessary. Each time a major PostgreSQL upgrade occurs, users must reimport large datasets, leading to the same problem of vacuum storms post-import. An INSERT FROZEN feature would allow importing data that is known to be immutable, preventing unnecessary vacuum overhead and reducing system strain. User impact with the change: - Users importing large, cold datasets (initial loads, migrations, version upgrades) can mark them as "frozen" during insertion, so pages are directly marked as frozen. - Reduced risk of autovacuum storms weeks after large imports. - More predictable system performance post-import and post-upgrade. - Avoid unnecessary rewriting of all pages after. - Significant improvement for users who perform regular major version upgrades and need to reload data. Implementation details: - A new INSERT FROZEN option could be introduced, similar to COPY FREEZE, allowing direct insertion of tuples in a frozen state. - This would likely require changes in heap storage logic to ensure tuples are written with a frozen XID at insert time. - Consideration should be given to transaction semantics and WAL logging to ensure consistency and crash recovery integrity. Estimated Development Time: Unknown (would require input from core developers to assess complexity). But I think it's not that much and pretty straightforward to implement experimentally. Then. Opportunity Window Period: ... Budget Money: ... Contact Information: If you have further question regarding the issues I experienced that this would solve, feel free to contact me Sébastien Caunes bokan...@gmail.com Thank you for your attention. -- Sébastien Caunes +33 6 7 229 229 7
Re: [Feature Request] INSERT FROZEN to Optimize Large Cold Data Imports and Migrations
Ok you mean that xid64 will remove the need for freezing... it's a way to see things. Le mar. 18 févr. 2025 à 15:57, Greg Sabino Mullane a écrit : > On Tue, Feb 18, 2025 at 9:17 AM Sébastien wrote: > >> Sorry it won't work. It just delays the problem. But still the freeze >> procedure must rewrite all pages. >> > > Actually, a 64-bit transaction ID allows for quite a "delay" - like > hundreds of millions of years at your current rate. :) > > (Yes, there are other reasons to vacuum, and other limits and problems > would arise. You'd have 99 problems, but a vacuum freeze ain't one.) > > Cheers, > Greg > > -- > Crunchy Data - https://www.crunchydata.com > Enterprise Postgres Software Products & Tech Support > > -- Sébastien Caunes +33 6 7 229 229 7
Re: [Feature Request] INSERT FROZEN to Optimize Large Cold Data Imports and Migrations
Hello weuhui, It's not a problem with heavy insert table but heavy delete. Also triggering more frequent autovacuum will not help because autovacuum does not delete recently dead tuples when a large and slow vacuum freeze operation older than their delete is still running in parallel. The solution was to increase the priority and speed of the vaccum freeze opeartion. Anyway, there should be a way to insert freeze data other than copy that does not work with foreign tables. (INSERT into my_table select * from foreign_table) Le lun. 17 févr. 2025 à 09:46, wenhui qiu a écrit : > HI Sébastien > You can check out the email subject:Trigger more frequent autovacuums > of heavy insert tables , I think it can alleviate the problem > > Thanks > > On Sat, Feb 15, 2025 at 3:13 AM Andres Freund wrote: > >> Hi, >> >> On 2025-02-13 10:52:31 +0100, Sébastien wrote: >> > Introduce an INSERT FROZEN feature to bypass vacuum processing for >> > large-scale cold data imports, reducing the impact on system performance >> > post-import. For large imports, migrations and major version upgrades. >> > Business Use-case: >> > >> > When importing massive datasets (e.g., 6-10TB) into PostgreSQL on a >> heavily >> > loaded server, we observed that the system struggled significantly weeks >> > later when the autovacuum process had to freeze all the imported data >> > pages. This led to severe performance degradation, requiring manual >> > intervention to prioritize vacuum jobs to complete them as quickly as >> > possible. >> >> What version of postgres was this? What batch sizes do you need to >> support? >> I.e. is all of this data inserted at once, or in steps? >> >> As already discussed, it seems unlikely that we'll ever support INSERT >> FROZEN, >> due to the potential of causing concurrent queries to give bogus answers. >> But >> there's actually a lot we can do to improve this short of INSERT FROZEN. >> >> The reason I asked for the version is that the behaviour would e.g. >> likely be >> worse before autovacuum_vacuum_insert_scale_factor existed. We are >> working on >> improvements around that in 18 too, ensuring that the gap between insert >> triggered vacuums does not grow forever. >> >> Several recent releases have also improved the situation around this in >> other >> ways, e.g. by just making vacuuming faster and by avoiding doing redundant >> work in more cases (by increasing relfrozenzid more aggressively). >> >> We've also been talking about performing freezing during e.g. >> checkpoints, if >> possible. >> >> If you're inserting all the data in a single transaction however, it'll be >> hard to improve most of this, because while that long long transaction >> runs, >> we can't do anything that needs to know the transaction has finished. >> OTOH, >> if it were a single transaction, you could already use COPY FREEZE. >> >> >> A somewhat related issue is that bulk INSERTs, in contrast to COPY, >> currently >> does not use the bulk-insert logic, leading the INSERT to cause a lot >> more WAL >> to be emitted compared to inserting the same data via COPY. >> >> >> > This issue is particularly critical during database *migrations* or >> *version >> > upgrades*, where a full data reload is often necessary. Each time a >> major >> > PostgreSQL upgrade occurs, users must reimport large datasets, leading >> to >> > the same problem of vacuum storms post-import. An INSERT FROZEN feature >> > would allow importing data that is known to be immutable, preventing >> > unnecessary vacuum overhead and reducing system strain. >> >> What are you using for such upgrades or migrations? I'd not expect INSERT >> to >> be used, due to the overhead that has compared to COPY. >> >> Greetings, >> >> Andres Freund >> >> >> -- Sébastien Caunes +33 6 7 229 229 7
Re: [Feature Request] INSERT FROZEN to Optimize Large Cold Data Imports and Migrations
Sorry it won't work. It just delays the problem. But still the freeze procedure must rewrite all pages. Le mar. 18 févr. 2025, 04:12, wenhui qiu a écrit : > Hello Sébastien > this case can be solved by xid64,but it seems like very few people > are interested.But it seems to me that xid64 should be implemented as soon > as possible. > > > Thanks > > On Mon, Feb 17, 2025 at 9:47 PM Sébastien wrote: > >> Hello weuhui, >> >> It's not a problem with heavy insert table but heavy delete. Also >> triggering more frequent autovacuum will not help because autovacuum does >> not delete recently dead tuples when a large and slow vacuum freeze >> operation older than their delete is still running in parallel. The >> solution was to increase the priority and speed of the vaccum freeze >> opeartion. >> >> Anyway, there should be a way to insert freeze data other than copy that >> does not work with foreign tables. (INSERT into my_table select * from >> foreign_table) >> >> >> >> Le lun. 17 févr. 2025 à 09:46, wenhui qiu a >> écrit : >> >>> HI Sébastien >>> You can check out the email subject:Trigger more frequent >>> autovacuums of heavy insert tables , I think it can alleviate the problem >>> >>> Thanks >>> >>> On Sat, Feb 15, 2025 at 3:13 AM Andres Freund >>> wrote: >>> >>>> Hi, >>>> >>>> On 2025-02-13 10:52:31 +0100, Sébastien wrote: >>>> > Introduce an INSERT FROZEN feature to bypass vacuum processing for >>>> > large-scale cold data imports, reducing the impact on system >>>> performance >>>> > post-import. For large imports, migrations and major version upgrades. >>>> > Business Use-case: >>>> > >>>> > When importing massive datasets (e.g., 6-10TB) into PostgreSQL on a >>>> heavily >>>> > loaded server, we observed that the system struggled significantly >>>> weeks >>>> > later when the autovacuum process had to freeze all the imported data >>>> > pages. This led to severe performance degradation, requiring manual >>>> > intervention to prioritize vacuum jobs to complete them as quickly as >>>> > possible. >>>> >>>> What version of postgres was this? What batch sizes do you need to >>>> support? >>>> I.e. is all of this data inserted at once, or in steps? >>>> >>>> As already discussed, it seems unlikely that we'll ever support INSERT >>>> FROZEN, >>>> due to the potential of causing concurrent queries to give bogus >>>> answers. But >>>> there's actually a lot we can do to improve this short of INSERT FROZEN. >>>> >>>> The reason I asked for the version is that the behaviour would e.g. >>>> likely be >>>> worse before autovacuum_vacuum_insert_scale_factor existed. We are >>>> working on >>>> improvements around that in 18 too, ensuring that the gap between insert >>>> triggered vacuums does not grow forever. >>>> >>>> Several recent releases have also improved the situation around this in >>>> other >>>> ways, e.g. by just making vacuuming faster and by avoiding doing >>>> redundant >>>> work in more cases (by increasing relfrozenzid more aggressively). >>>> >>>> We've also been talking about performing freezing during e.g. >>>> checkpoints, if >>>> possible. >>>> >>>> If you're inserting all the data in a single transaction however, it'll >>>> be >>>> hard to improve most of this, because while that long long transaction >>>> runs, >>>> we can't do anything that needs to know the transaction has finished. >>>> OTOH, >>>> if it were a single transaction, you could already use COPY FREEZE. >>>> >>>> >>>> A somewhat related issue is that bulk INSERTs, in contrast to COPY, >>>> currently >>>> does not use the bulk-insert logic, leading the INSERT to cause a lot >>>> more WAL >>>> to be emitted compared to inserting the same data via COPY. >>>> >>>> >>>> > This issue is particularly critical during database *migrations* or >>>> *version >>>> > upgrades*, where a full data reload is often necessary. Each time a >>>> major >>>> > PostgreSQL upgrade occurs, users must reimport large datasets, >>>> leading to >>>> > the same problem of vacuum storms post-import. An INSERT FROZEN >>>> feature >>>> > would allow importing data that is known to be immutable, preventing >>>> > unnecessary vacuum overhead and reducing system strain. >>>> >>>> What are you using for such upgrades or migrations? I'd not expect >>>> INSERT to >>>> be used, due to the overhead that has compared to COPY. >>>> >>>> Greetings, >>>> >>>> Andres Freund >>>> >>>> >>>> >> >> -- >> Sébastien Caunes >> +33 6 7 229 229 7 >> >
Re: Timeline ID hexadecimal format
On 31/01/2023 20:16, Greg Stark wrote: A hint or something just in that case might be enough? It seems to be a -1 ; let's try to improve the documentation, with the attached patch best regards, -- Sébastien diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml index be05a33205..7e26b51031 100644 --- a/doc/src/sgml/backup.sgml +++ b/doc/src/sgml/backup.sgml @@ -1332,7 +1332,8 @@ restore_command = 'cp /mnt/server/archivedir/%f %p' you like, add comments to a history file to record your own notes about how and why this particular timeline was created. Such comments will be especially valuable when you have a thicket of different timelines as -a result of experimentation. +a result of experimentation. In both WAL segment file names and history files, +the timeline ID number is expressed in hexadecimal. diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index e5c41cc6c6..3b5d041d92 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -4110,7 +4110,9 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"' # Windows current when the base backup was taken. The value latest recovers to the latest timeline found in the archive, which is useful in -a standby server. latest is the default. +a standby server. A numerical value expressed in hexadecimal must be +prefixed with 0x, for example 0x11. +latest is the default. diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml index 343f0482a9..4ae8f2ebdd 100644 --- a/doc/src/sgml/ref/pg_waldump.sgml +++ b/doc/src/sgml/ref/pg_waldump.sgml @@ -215,7 +215,8 @@ PostgreSQL documentation Timeline from which to read WAL records. The default is to use the value in startseg, if that is specified; otherwise, the -default is 1. +default is 1. The value must be expressed in decimal, contrary to the hexadecimal +value given in WAL segment file names and history files.
Re: Timeline ID hexadecimal format
On 02/03/2023 09:12, Peter Eisentraut wrote: On 24.02.23 17:27, Sébastien Lardière wrote: diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml index be05a33205..7e26b51031 100644 --- a/doc/src/sgml/backup.sgml +++ b/doc/src/sgml/backup.sgml @@ -1332,7 +1332,8 @@ restore_command = 'cp/mnt/server/archivedir/%f %p' you like, add comments to a history file to record your own notes about how and why this particular timeline was created. Such comments will be especially valuable when you have a thicket of different timelines as - a result of experimentation. + a result of experimentation. In both WAL segment file names and history files, + the timeline ID number is expressed in hexadecimal. I think here it would be more helpful to show actual examples. Like, here is a possible file name, this is what the different parts mean. So you mean explain the WAL filename and the history filename ? Is it the good place for it ? diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index e5c41cc6c6..3b5d041d92 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -4110,7 +4110,9 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"' # Windows current when the base backup was taken. The value latest recovers to the latest timeline found in the archive, which is useful in - a standby server. latest is the default. + a standby server. A numerical value expressed in hexadecimal must be + prefixed with 0x, for example 0x11. + latest is the default. This applies to all configuration parameters, so it doesn't need to be mentioned explicitly for individual ones. Probably, but is there another parameter with the same consequence ? worth it to document this point globally ? diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml index 343f0482a9..4ae8f2ebdd 100644 --- a/doc/src/sgml/ref/pg_waldump.sgml +++ b/doc/src/sgml/ref/pg_waldump.sgml @@ -215,7 +215,8 @@ PostgreSQL documentation Timeline from which to read WAL records. The default is to use the value in startseg, if that is specified; otherwise, the - default is 1. + default is 1. The value must be expressed in decimal, contrary to the hexadecimal + value given in WAL segment file names and history files. Maybe this could be fixed instead? Indeed, and strtoul is probably a better option than sscanf, don't you think ? -- Sébastien
Re: Timeline ID hexadecimal format
On 06/03/2023 18:04, Peter Eisentraut wrote: On 03.03.23 16:52, Sébastien Lardière wrote: On 02/03/2023 09:12, Peter Eisentraut wrote: I think here it would be more helpful to show actual examples. Like, here is a possible file name, this is what the different parts mean. So you mean explain the WAL filename and the history filename ? Is it the good place for it ? Well, your patch says, by the way, the timeline ID in the file is hexadecimal. Then one might ask, what file, what is a timeline, what are the other numbers in the file, etc. It seems very specific in this context. I don't know if the format of these file names is actually documented somewhere. Well, in the context of this patch, the usage both filename are explained juste before, so it seems understandable to me Timelines are explained in this place : https://www.postgresql.org/docs/current/continuous-archiving.html#BACKUP-TIMELINES so the patch explains the format there This applies to all configuration parameters, so it doesn't need to be mentioned explicitly for individual ones. Probably, but is there another parameter with the same consequence ? worth it to document this point globally ? It's ok to mention it again. We do something similar for example at unix_socket_permissions. But maybe with more context, like "If you want to specify a timeline ID hexadecimal (for example, if extracted from a WAL file name), then prefix it with a 0x". Ok, I've improved the message Maybe this could be fixed instead? Indeed, and strtoul is probably a better option than sscanf, don't you think ? Yeah, the use of sscanf() is kind of weird here. We have been moving the option parsing to use option_parse_int(). Maybe hex support could be added there. Or just use strtoul(). I've made the change with strtoul About option_parse_int(), actually, strtoint() is used, do we need a option_parse_ul() fonction ? patch attached, best regards, -- Sébastien diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml index be05a33205..fb86a3fec5 100644 --- a/doc/src/sgml/backup.sgml +++ b/doc/src/sgml/backup.sgml @@ -1332,7 +1332,8 @@ restore_command = 'cp /mnt/server/archivedir/%f %p' you like, add comments to a history file to record your own notes about how and why this particular timeline was created. Such comments will be especially valuable when you have a thicket of different timelines as -a result of experimentation. +a result of experimentation. The timeline identifier is an integer which +is used in hexadecimal format in both WAL segment file names and history files. diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index e5c41cc6c6..6c0d63b73d 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -4110,7 +4110,11 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"' # Windows current when the base backup was taken. The value latest recovers to the latest timeline found in the archive, which is useful in -a standby server. latest is the default. +a standby server. A numerical value expressed in hexadecimal, by exemple +from WAL filename or history file, must be prefixed with 0x, +for example 0x11 if the WAL filename +is 001100A1004F. +latest is the default. diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml index 343f0482a9..d92948c68a 100644 --- a/doc/src/sgml/ref/pg_waldump.sgml +++ b/doc/src/sgml/ref/pg_waldump.sgml @@ -215,7 +215,9 @@ PostgreSQL documentation Timeline from which to read WAL records. The default is to use the value in startseg, if that is specified; otherwise, the -default is 1. +default is 1. The value can be expressed in decimal or hexadecimal format. +The hexadecimal format, given by WAL filename, must be preceded by 0x, +by example 0x11. diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c index 44b5c8726e..b29d5223ce 100644 --- a/src/bin/pg_waldump/pg_waldump.c +++ b/src/bin/pg_waldump/pg_waldump.c @@ -770,6 +770,7 @@ usage(void) printf(_(" -R, --relation=T/D/R only show records that modify blocks in relation T/D/R\n")); printf(_(" -s, --start=RECPTR start reading at WAL location RECPTR\n")); printf(_(" -t, --timeline=TLI timeline from which to read WAL records\n" + " hexadecimal value, from WAL filename, must be preceded by 0x\n" " (default: 1 or the value used in STARTSEG)\n")); printf(_(" -V, --version output version information, then exit\n")); printf(_(" -w, --fullpage only show records wi
Re: Timeline ID hexadecimal format
On 20/03/2023 09:17, Peter Eisentraut wrote: I have committed the two documentation changes, with some minor adjustments. Thank you, On 07.03.23 18:14, Sébastien Lardière wrote: Maybe this could be fixed instead? Indeed, and strtoul is probably a better option than sscanf, don't you think ? Yeah, the use of sscanf() is kind of weird here. We have been moving the option parsing to use option_parse_int(). Maybe hex support could be added there. Or just use strtoul(). I've made the change with strtoul About option_parse_int(), actually, strtoint() is used, do we need a option_parse_ul() fonction ? For the option parsing, I propose the attached patch. This follows the structure of option_parse_int(), so in the future it could be extracted and refactored in the same way, if there is more need. ok for me, it accept 0x values and refuse wrong values thank you, -- Sébastien
Re: Timeline ID hexadecimal format
On 21/03/2023 08:15, Peter Eisentraut wrote: On 20.03.23 10:40, Sébastien Lardière wrote: About option_parse_int(), actually, strtoint() is used, do we need a option_parse_ul() fonction ? For the option parsing, I propose the attached patch. This follows the structure of option_parse_int(), so in the future it could be extracted and refactored in the same way, if there is more need. ok for me, it accept 0x values and refuse wrong values committed thanks, -- Sébastien
Re: planner chooses incremental but not the best one
On 15/12/2023 09:58, Richard Guo wrote: On Thu, Dec 14, 2023 at 6:02 PM Richard Guo wrote: It seems that we need to improve estimate of distinct values in estimate_num_groups() when taking the selectivity of restrictions into account. In 84f9a35e3 we changed to a new formula to perform such estimation. But that does not apply to the case here, because for an appendrel, set_append_rel_size() always sets "raw tuples" count equal to "rows", and that would make estimate_num_groups() skip the adjustment of the estimate using the new formula. I'm wondering why we set the appendrel's 'tuples' equal to its 'rows'. Why don't we set it to the accumulated estimate of tuples from each live child, like attached? I believe this aligns more closely with reality. And this would also allow us to adjust the estimate for the number of distinct values in estimate_num_groups() for appendrels using the new formula introduced in 84f9a35e3. As I experimented, this can improve the estimate for appendrels. For instance, create table t (a int, b int, c float) partition by range(a); create table tp1 partition of t for values from (0) to (1000); create table tp2 partition of t for values from (1000) to (2000); insert into t select i%2000, (10 * random())::int, random() from generate_series(1,100) i; analyze t; explain analyze select b from t where c < 0.1 group by b; -- on master HashAggregate (cost=18659.28..19598.74 rows=93946 width=4) (actual time=220.760..234.439 rows=63224 loops=1) -- on patched HashAggregate (cost=18659.28..19294.25 rows=63497 width=4) (actual time=235.161..250.023 rows=63224 loops=1) With the patch the estimate for the number of distinct 'b' values is more accurate. BTW, this patch does not change any existing regression test results. I attempted to devise a regression test that shows how this change can improve query plans, but failed. Should I try harder to find such a test case? Hi, thank you for the patch ; I've tried it and it works with the scenario you provide. As Nicolas's co-worker, I've been involved in this case, but, unfortunately, we're not able to test the patch with the actual data for the moment, but I'll ask a dump to the real owner. About the regression test, I don't know how to implement it either. best regards, -- Sébastien
Timeline ID hexadecimal format
Hi, I've been puzzled by this message: ~~~ LOG: fetching timeline history file for timeline 17 from primary server FATAL: could not receive timeline history file from the primary server: ERROR: could not open file "pg_xlog/0011.history": No such file or directory ~~~ It took me a while to understand that the timeline id 11 in hexadecimal is the same as the timeline id 17 in decimal. It appears that the first message is formatted with %u instead of %X, and there some others places with the some format, while WAL filename and history file used hexadecimal. There is another place where timeline id is used : pg_waldump, and in these tools, timeline id ( -t or --timeline ) should be given in decimal, while filename gives it in hexadecimal : imho, it's not user-friendly, and can lead to user's bad input for timeline id. The attached patch proposes to change the format of timelineid from %u to %X. Regarding .po files, I don't know how to manage them. Is there any routine to spread the modifications? Or should I identify and change each message? best regards, -- Sébastien diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c index f390c177e4..77b0fbc962 100644 --- a/src/backend/access/rmgrdesc/xlogdesc.c +++ b/src/backend/access/rmgrdesc/xlogdesc.c @@ -45,7 +45,7 @@ xlog_desc(StringInfo buf, XLogReaderState *record) CheckPoint *checkpoint = (CheckPoint *) rec; appendStringInfo(buf, "redo %X/%X; " - "tli %u; prev tli %u; fpw %s; xid %u:%u; oid %u; multi %u; offset %u; " + "tli %X; prev tli %X; fpw %s; xid %u:%u; oid %u; multi %u; offset %u; " "oldest xid %u in DB %u; oldest multi %u in DB %u; " "oldest/newest commit timestamp xid: %u/%u; " "oldest running xid %u; %s", @@ -135,7 +135,7 @@ xlog_desc(StringInfo buf, XLogReaderState *record) xl_end_of_recovery xlrec; memcpy(&xlrec, rec, sizeof(xl_end_of_recovery)); - appendStringInfo(buf, "tli %u; prev tli %u; time %s", + appendStringInfo(buf, "tli %X; prev tli %X; time %s", xlrec.ThisTimeLineID, xlrec.PrevTimeLineID, timestamptz_to_str(xlrec.end_time)); } diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c index aa6c929477..97ff6e80b6 100644 --- a/src/backend/access/transam/xlogreader.c +++ b/src/backend/access/transam/xlogreader.c @@ -1329,7 +1329,7 @@ XLogReaderValidatePageHeader(XLogReaderState *state, XLogRecPtr recptr, XLogFileName(fname, state->seg.ws_tli, segno, state->segcxt.ws_segsize); report_invalid_record(state, - "out-of-sequence timeline ID %u (after %u) in WAL segment %s, LSN %X/%X, offset %u", + "out-of-sequence timeline ID %X (after %X) in WAL segment %s, LSN %X/%X, offset %u", hdr->xlp_tli, state->latestPageTLI, fname, diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c index 2a5352f879..c814ef2767 100644 --- a/src/backend/access/transam/xlogrecovery.c +++ b/src/backend/access/transam/xlogrecovery.c @@ -3080,7 +3080,7 @@ ReadRecord(XLogPrefetcher *xlogprefetcher, int emode, XLogFileName(fname, xlogreader->seg.ws_tli, segno, wal_segment_size); ereport(emode_for_corrupt_record(emode, xlogreader->EndRecPtr), - (errmsg("unexpected timeline ID %u in WAL segment %s, LSN %X/%X, offset %u", + (errmsg("unexpected timeline ID %X in WAL segment %s, LSN %X/%X, offset %u", xlogreader->latestPageTLI, fname, LSN_FORMAT_ARGS(xlogreader->latestPagePtr), @@ -3719,7 +3719,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, tli = tliOfPointInHistory(tliRecPtr, expectedTLEs); if (curFileTLI > 0 && tli < curFileTLI) -elog(ERROR, "according to history file, WAL location %X/%X belongs to timeline %u, but previous recovered WAL file came from timeline %u", +elog(ERROR, "according to history file, WAL location %X/%X belongs to timeline %X, but previous recovered WAL file came from timeline %X", LSN_FORMAT_ARGS(tliRecPtr), tli, curFileTLI); } @@ -4019,7 +4019,7 @@ rescanLatestTimeLine(TimeLineID replayTLI, XLogRecPtr replayLSN) if (!found) { ereport(LOG, -(errmsg("new timeline %u is not a child of database system timeline %u", +(errmsg("new timeline %X is not a child of database system timeline %X", newtarget, replayTLI))); return false; @@ -4033,7 +4033,7 @@ rescanLatestTimeLine(TimeLineID replayTLI, XLogRecPtr replayLSN) if (currentTle->end < replayLSN) { ereport(LOG, -(errmsg("new timeline %u forked off current database system timeline %u before current recovery point %X/%X&q
Re: Timeline ID hexadecimal format
On 27/01/2023 15:55, Peter Eisentraut wrote: On 27.01.23 14:52, Sébastien Lardière wrote: The attached patch proposes to change the format of timelineid from %u to %X. I think your complaint has merit. But note that if we did a change like this, then log files or reports from different versions would have different meaning without a visual difference, which is kind of what you complained about in the first place. At least we should do something like 0x%X. Indeed, but the messages that puzzled was in one log file, just together, not in some differents versions. But yes, it should be documented somewhere, actually, I can't find any good place for that, While digging, It seems that recovery_target_timeline should be given in decimal, not in hexadecimal, which seems odd to me ; and pg_controldata use decimal too, not hexadecimal… So, if this idea is correct, the given patch is not enough. Anyway, do you think it is a good idea or not ? Regarding .po files, I don't know how to manage them. Is there any routine to spread the modifications? Or should I identify and change each message? Don't worry about this. This is handled elsewhere. nice, regards, -- Sébastien
Re: Timeline ID hexadecimal format
On 27/01/2023 15:55, Peter Eisentraut wrote: On 27.01.23 14:52, Sébastien Lardière wrote: The attached patch proposes to change the format of timelineid from %u to %X. I think your complaint has merit. But note that if we did a change like this, then log files or reports from different versions would have different meaning without a visual difference, which is kind of what you complained about in the first place. At least we should do something like 0x%X. Hi, Here's the patch with the suggested format ; plus, I add some note in the documentation about recovery_target_timeline, because I don't get how strtoul(), with the special 0 base parameter can work without 0x prefix ; I suppose that nobody use it. I also change pg_controldata and the usage of this output by pg_upgrade. I let internal usages unchanded : content of backup manifest and content of history file. Should I open a commitfest entry, or is it too soon ? regards, -- Sébastien [1mdiff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml[m [1mindex be05a33205..7e26b51031 100644[m [1m--- a/doc/src/sgml/backup.sgml[m [1m+++ b/doc/src/sgml/backup.sgml[m [36m@@ -1332,7 +1332,8 @@[m [mrestore_command = 'cp /mnt/server/archivedir/%f %p'[m you like, add comments to a history file to record your own notes about[m how and why this particular timeline was created. Such comments will be[m especially valuable when you have a thicket of different timelines as[m [31m-a result of experimentation.[m [32m+[m[32ma result of experimentation. In both WAL segment file names and history files,[m [32m+[m[32mthe timeline ID number is expressed in hexadecimal.[m [m [m [m [1mdiff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml[m [1mindex 1cf53c74ea..508774cfee 100644[m [1m--- a/doc/src/sgml/config.sgml[m [1m+++ b/doc/src/sgml/config.sgml[m [36m@@ -4110,7 +4110,9 @@[m [mrestore_command = 'copy "C:\\server\\archivedir\\%f" "%p"' # Windows[m current when the base backup was taken. The[m value latest recovers[m to the latest timeline found in the archive, which is useful in[m [31m-a standby server. latest is the default.[m [32m+[m[32ma standby server. A numerical value expressed in hexadecimal must be[m [32m+[m[32mprefixed with 0x, for example 0x11.[m [32m+[m[32mlatest is the default.[m [m [m [m [1mdiff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c[m [1mindex f390c177e4..bdbe993877 100644[m [1m--- a/src/backend/access/rmgrdesc/xlogdesc.c[m [1m+++ b/src/backend/access/rmgrdesc/xlogdesc.c[m [36m@@ -45,7 +45,7 @@[m [mxlog_desc(StringInfo buf, XLogReaderState *record)[m CheckPoint *checkpoint = (CheckPoint *) rec;[m [m appendStringInfo(buf, "redo %X/%X; "[m [31m- "tli %u; prev tli %u; fpw %s; xid %u:%u; oid %u; multi %u; offset %u; "[m [32m+[m [32m "tli 0x%X; prev tli 0x%X; fpw %s; xid %u:%u; oid %u; multi %u; offset %u; "[m "oldest xid %u in DB %u; oldest multi %u in DB %u; "[m "oldest/newest commit timestamp xid: %u/%u; "[m "oldest running xid %u; %s",[m [36m@@ -135,7 +135,7 @@[m [mxlog_desc(StringInfo buf, XLogReaderState *record)[m xl_end_of_recovery xlrec;[m [m memcpy(&xlrec, rec, sizeof(xl_end_of_recovery));[m [31m- appendStringInfo(buf, "tli %u; prev tli %u; time %s",[m [32m+[m [32mappendStringInfo(buf, "tli 0x%X; prev tli 0x%X; time %s",[m xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,[m timestamptz_to_str(xlrec.end_time));[m }[m [1mdiff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c[m [1mindex fb4c860bde..c22cf4b2a1 100644[m [1m--- a/src/backend/access/transam/xlog.c[m [1m+++ b/src/backend/access/transam/xlog.c[m [36m@@ -7819,7 +7819,7 @@[m [mxlog_redo(XLogReaderState *record)[m (void) GetCurrentReplayRecPtr(&replayTLI);[m if (checkPoint.ThisTimeLineID != replayTLI)[m ereport(PANIC,[m [31m- (errmsg("unexpected timeline ID %u (should be %u) in shutdown checkpoint record",[m [32m+[m [32m(errmsg("unexpected timeline ID 0x%X (should be 0x%X) in shutdown checkpoint record",[m checkPoint.ThisTimeLineID, replayTLI)));[m [m RecoveryRestartPoint(&checkPoint, record);[m [36m@@ -7906,7 +7906,7 @@[m [mxlog_redo(XLogReaderState *record)[m (void) GetCurrentReplayRecPtr(&replayTLI);[m if (xlrec.ThisTimeLineID != replayTLI)[m ereport(PANIC,[m [31m- (errmsg("unexpected timeline ID %u (should be %u) in end-of-recovery record",[m [32m+[m [32m(errmsg("unexpected timeline ID 0x%X (should be 0x%X) in end-of-recovery record",[m
Re: Timeline ID hexadecimal format
On 31/01/2023 12:26, Ashutosh Bapat wrote: On Mon, Jan 30, 2023 at 9:35 PM Sébastien Lardière wrote: On 27/01/2023 15:55, Peter Eisentraut wrote: On 27.01.23 14:52, Sébastien Lardière wrote: The attached patch proposes to change the format of timelineid from %u to %X. I think your complaint has merit. But note that if we did a change like this, then log files or reports from different versions would have different meaning without a visual difference, which is kind of what you complained about in the first place. At least we should do something like 0x%X. Hi, Here's the patch with the suggested format ; plus, I add some note in the documentation about recovery_target_timeline, because I don't get how strtoul(), with the special 0 base parameter can work without 0x prefix ; I suppose that nobody use it. I also change pg_controldata and the usage of this output by pg_upgrade. I let internal usages unchanded : content of backup manifest and content of history file. The patch seems to have some special/unprintable characters in it. I see a lot ^[[ in there. I can't read the patch because of that. Sorry for that, it was the --color from git diff, it's fixed, I hope, thank you regards, -- Sébastien diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml index be05a33205..7e26b51031 100644 --- a/doc/src/sgml/backup.sgml +++ b/doc/src/sgml/backup.sgml @@ -1332,7 +1332,8 @@ restore_command = 'cp /mnt/server/archivedir/%f %p' you like, add comments to a history file to record your own notes about how and why this particular timeline was created. Such comments will be especially valuable when you have a thicket of different timelines as -a result of experimentation. +a result of experimentation. In both WAL segment file names and history files, +the timeline ID number is expressed in hexadecimal. diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 1cf53c74ea..508774cfee 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -4110,7 +4110,9 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"' # Windows current when the base backup was taken. The value latest recovers to the latest timeline found in the archive, which is useful in -a standby server. latest is the default. +a standby server. A numerical value expressed in hexadecimal must be +prefixed with 0x, for example 0x11. +latest is the default. diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c index f390c177e4..bdbe993877 100644 --- a/src/backend/access/rmgrdesc/xlogdesc.c +++ b/src/backend/access/rmgrdesc/xlogdesc.c @@ -45,7 +45,7 @@ xlog_desc(StringInfo buf, XLogReaderState *record) CheckPoint *checkpoint = (CheckPoint *) rec; appendStringInfo(buf, "redo %X/%X; " - "tli %u; prev tli %u; fpw %s; xid %u:%u; oid %u; multi %u; offset %u; " + "tli 0x%X; prev tli 0x%X; fpw %s; xid %u:%u; oid %u; multi %u; offset %u; " "oldest xid %u in DB %u; oldest multi %u in DB %u; " "oldest/newest commit timestamp xid: %u/%u; " "oldest running xid %u; %s", @@ -135,7 +135,7 @@ xlog_desc(StringInfo buf, XLogReaderState *record) xl_end_of_recovery xlrec; memcpy(&xlrec, rec, sizeof(xl_end_of_recovery)); - appendStringInfo(buf, "tli %u; prev tli %u; time %s", + appendStringInfo(buf, "tli 0x%X; prev tli 0x%X; time %s", xlrec.ThisTimeLineID, xlrec.PrevTimeLineID, timestamptz_to_str(xlrec.end_time)); } diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index fb4c860bde..c22cf4b2a1 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -7819,7 +7819,7 @@ xlog_redo(XLogReaderState *record) (void) GetCurrentReplayRecPtr(&replayTLI); if (checkPoint.ThisTimeLineID != replayTLI) ereport(PANIC, - (errmsg("unexpected timeline ID %u (should be %u) in shutdown checkpoint record", + (errmsg("unexpected timeline ID 0x%X (should be 0x%X) in shutdown checkpoint record", checkPoint.ThisTimeLineID, replayTLI))); RecoveryRestartPoint(&checkPoint, record); @@ -7906,7 +7906,7 @@ xlog_redo(XLogReaderState *record) (void) GetCurrentReplayRecPtr(&replayTLI); if (xlrec.ThisTimeLineID != replayTLI) ereport(PANIC, - (errmsg("unexpected timeline ID %u (should be %u) in end-of-recovery record", + (errmsg("unexpected timeline ID 0x%X (should be 0x%X) in end-of-recovery record", xlrec.ThisTimeLineID, replayTLI))); } else if (info == XLOG_NOOP) diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c index aa6c929477..1643d0d98c 100644 --- a/src/backend/a
Re: Timeline ID hexadecimal format
On 31/01/2023 10:53, Peter Eisentraut wrote: On 30.01.23 17:05, Sébastien Lardière wrote: Here's the patch with the suggested format ; plus, I add some note in the documentation about recovery_target_timeline, because I don't get how strtoul(), with the special 0 base parameter can work without 0x prefix ; I suppose that nobody use it. I also change pg_controldata and the usage of this output by pg_upgrade. I let internal usages unchanded : content of backup manifest and content of history file. Should I open a commitfest entry, or is it too soon ? It is not too soon. (The next commitfest is open for new patch submissions as soon as the current one is "in progress", which closes it for new patches.) Done : https://commitfest.postgresql.org/42/4155/ -- Sébastien
Re: Timeline ID hexadecimal format
On 31/01/2023 20:16, Greg Stark wrote: The fact that the *filename* has it encoded in hex is an implementation detail and really gets exposed here because it's giving you the underlying system error that caused the problem. It's an implementation detail, but an exposed detail, so, people refer to the filename to find the timeline ID (That's why it happened to me) The confusion only arises when the two are juxtaposed. A hint or something just in that case might be enough? Thanks, i got your point. Note that my proposal was to remove the ambiguous notation which happen in some case (as in 11 <-> 17). A hint is useless in most of the case, because there is no ambiguous. That's why i though format hexadecimal everywhere. At least, can I propose to improve the documentation to expose the fact that the timeline ID is exposed in hexadecimal in filenames but must be used in decimal in recovery_target_timeline and pg_waldump ? regards, -- Sébastien
Re: PostgreSQL 13 Beta 1 Release Announcement Draft
On 19/05/2020 04:29, Jonathan S. Katz wrote: > Hi, > > Attached is a draft of the release announcement for the PostgreSQL 13 > Beta 1 release this week. > > Hi, Maybe I'm too late, but in this paragraph : > `psql` now includes the `\warn` command that is similar to the `\echo` command > in terms of outputting data, except `\warn` sends it to stderr. And in case you > need additional guidance on any of the PostgreSQL commands, the `--help` flag > now includes a link to [https://www.postgresql.org](https://www.postgresql.org). is it --help shouldn't be /help ? Same thing in the release note (https://www.postgresql.org/docs/13/release-13.html) : > Add the PostgreSQL home page to command-line |--help| output (Peter Eisentraut) as it probalbly refer to 27f3dea64833d68c1fa08c1e5d26176a579f69c8, isn't it ? regards, -- Sébastien