Re: [Feature Request] INSERT FROZEN to Optimize Large Cold Data Imports and Migrations

2025-02-13 Thread Sébastien
Thank you for your answer TOM.
I'd like to have this option only for initial loading of huge amounts of
data, where atomicity consistency is needed at all. Maybe an option on
startup command just for initial import mode, would be nice.

I had huge problems on server 3 weeks after a 6 TB migration from other DB.
I think it's sad to rewrite all data twice.

Le jeu. 13 févr. 2025 à 16:08, Tom Lane  a écrit :

> =?UTF-8?Q?S=C3=A9bastien?=  writes:
> > Implementation details:
>
> >- A new INSERT FROZEN option could be introduced, similar to COPY
> FREEZE,
> >allowing direct insertion of tuples in a frozen state.
> >- This would likely require changes in heap storage logic to ensure
> >tuples are written with a frozen XID at insert time.
> >- Consideration should be given to transaction semantics and WAL
> logging
> >to ensure consistency and crash recovery integrity.
>
> That last is exactly why this won't happen.  A frozen tuple would be
> considered committed and visible the instant it appears in the table,
> thus completely breaking both atomicity and integrity of the
> transaction.
>
> There has been work going on recently to reduce the impact of freezing
> massive amounts of data by spreading the work more effectively [1].
> I don't say that that particular commit has completely solved the
> problem, but I think that continued effort in that direction is more
> likely to yield usable results than what you're suggesting.
>
> BTW, this might or might not be usable in your particular workflow,
> but: there have long been some optimizations for data load into a
> table created in the same transaction.  The idea there is that if the
> transaction rolls back, the table will never have been visible to any
> other transaction at all, so that maintaining atomicity/integrity of
> its contents is moot.
>
> regards, tom lane
>
> [1]
> https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=052026c9b
>


-- 
Sébastien Caunes
+33 6 7 229 229 7


Fwd: [Feature Request] Per-Database Transaction Logs for Enhanced Isolation and New Capabilities

2025-02-13 Thread Sébastien
Introduce per-database transaction logs (WAL) and transaction ID spaces to
improve database isolation, enable hot-mounting/unmounting, selective
replication, and open new possibilities in PostgreSQL.
Business Use-case:

With modern SSDs offering high throughput and low latency, maintaining a
single *global* transaction log across all databases in a PostgreSQL
instance is becoming an unnecessary constraint.

By allowing *each database to have its own transaction log and transaction
ID space*, PostgreSQL could achieve significant improvements in
performance, isolation, and flexibility.
*Key Benefits*:

   - *Better isolation between databases*:
  - A long-running transaction in one database would no longer prevent
  vacuuming of tables in another.
  - No risk of transaction wraparound issues in one database affecting
  others.
   - *Hot-mounting/unmounting databases*:
  - Ability to attach/detach databases dynamically at the filesystem
  level without impacting the rest of the cluster.
  - Faster database restores and migrations by simply copying database
  files and starting the instance.
   - *Selective replication*:
  - Currently, logical replication can be done at the table level, but
  physical replication applies to the entire cluster.
  - With per-database WAL, it would be possible to *replicate only
  specific databases* without requiring complex logical replication
  setups.
   - *More flexible backup & restore*:
  - Ability to back up and restore *individual databases* with
  transaction consistency, instead of full-cluster backups.
  - Faster recovery and better disaster recovery options.
   - *Better integration with cloud and containerized environments*:
   Would enable dynamically adding and removing databases in cloud
   environments without cluster-wide restarts.

User impact with the change:

   - Users with large multi-database clusters would see *better transaction
   isolation*, fewer maintenance conflicts, and *more flexible database
   management*.
   - Organizations running *multi-tenant* environments or *per-database
   replication* setups would gain *easier and more efficient ways to manage
   databases*.
   - PostgreSQL would become much more *modular and cloud-friendly*,
   aligning it with modern high-availability and container-based deployments.

Implementation details:

   - Requires modifying PostgreSQL's WAL and transaction system to support
   per-database transaction logs.
   - WAL archiving, replication, and recovery logic would need adjustments
   to support per-database operations.
   - Needs careful handling of catalog metadata (such as pg_database) to
   ensure atomicity when attaching/detaching databases.

Estimated Development Time:

I do not know PostgreSQL's internal architecture well enough to assess the
full impact of such a change. However, taking a step back, it seems that
rather than deeply modifying the core engine, an alternative approach could
be to spawn a separate PostgreSQL engine per database. In this case, the
main entry point would act more like a connection bouncer, routing requests
to individual database engines.
Opportunity Window Period:

As SSD and cloud-based infrastructures become the norm, this change would
provide *major competitive advantages* for PostgreSQL in multi-tenant,
high-performance, and cloud-native use cases.
Budget Money:

...
Contact Information:

Sebastien Caunes
sebast...@pixseed.fr


[Feature Request] INSERT FROZEN to Optimize Large Cold Data Imports and Migrations

2025-02-13 Thread Sébastien
One-line Summary:

Introduce an INSERT FROZEN feature to bypass vacuum processing for
large-scale cold data imports, reducing the impact on system performance
post-import. For large imports, migrations and major version upgrades.
Business Use-case:

When importing massive datasets (e.g., 6-10TB) into PostgreSQL on a heavily
loaded server, we observed that the system struggled significantly weeks
later when the autovacuum process had to freeze all the imported data
pages. This led to severe performance degradation, requiring manual
intervention to prioritize vacuum jobs to complete them as quickly as
possible.

This issue is particularly critical during database *migrations* or *version
upgrades*, where a full data reload is often necessary. Each time a major
PostgreSQL upgrade occurs, users must reimport large datasets, leading to
the same problem of vacuum storms post-import. An INSERT FROZEN feature
would allow importing data that is known to be immutable, preventing
unnecessary vacuum overhead and reducing system strain.
User impact with the change:

   - Users importing large, cold datasets (initial loads, migrations,
   version upgrades) can mark them as "frozen" during insertion, so pages are
   directly marked as frozen.
   - Reduced risk of autovacuum storms weeks after large imports.
   - More predictable system performance post-import and post-upgrade.
   - Avoid unnecessary rewriting of all pages after.
   - Significant improvement for users who perform regular major version
   upgrades and need to reload data.

Implementation details:

   - A new INSERT FROZEN option could be introduced, similar to COPY FREEZE,
   allowing direct insertion of tuples in a frozen state.
   - This would likely require changes in heap storage logic to ensure
   tuples are written with a frozen XID at insert time.
   - Consideration should be given to transaction semantics and WAL logging
   to ensure consistency and crash recovery integrity.

Estimated Development Time:

Unknown (would require input from core developers to assess complexity).
But I think it's not that much and pretty straightforward to implement
experimentally. Then.
Opportunity Window Period:

...
Budget Money:

...
Contact Information:
If you have further question regarding the issues I experienced that this
would solve, feel free to contact me

Sébastien Caunes bokan...@gmail.com

Thank you for your attention.


-- 
Sébastien Caunes
+33 6 7 229 229 7


Re: [Feature Request] INSERT FROZEN to Optimize Large Cold Data Imports and Migrations

2025-02-19 Thread Sébastien
Ok you mean that xid64 will remove the need for freezing... it's a way to
see things.

Le mar. 18 févr. 2025 à 15:57, Greg Sabino Mullane  a
écrit :

> On Tue, Feb 18, 2025 at 9:17 AM Sébastien  wrote:
>
>> Sorry it won't work. It just delays the problem. But still the freeze
>> procedure must rewrite all pages.
>>
>
> Actually, a 64-bit transaction ID allows for quite a "delay" - like
> hundreds of millions of years at your current rate. :)
>
> (Yes, there are other reasons to vacuum, and other limits and problems
> would arise. You'd have 99 problems, but a vacuum freeze ain't one.)
>
> Cheers,
> Greg
>
> --
> Crunchy Data - https://www.crunchydata.com
> Enterprise Postgres Software Products & Tech Support
>
>

-- 
Sébastien Caunes
+33 6 7 229 229 7


Re: [Feature Request] INSERT FROZEN to Optimize Large Cold Data Imports and Migrations

2025-02-18 Thread Sébastien
Hello weuhui,

It's not a problem with heavy insert table but heavy delete. Also
triggering more frequent autovacuum will not help because autovacuum does
not delete recently dead tuples when a large and slow vacuum freeze
operation older than their delete is still running in parallel. The
solution was to increase the priority and speed of the vaccum freeze
opeartion.

Anyway, there should be a way to insert freeze data other than copy that
does not work with foreign tables. (INSERT into my_table select * from
foreign_table)



Le lun. 17 févr. 2025 à 09:46, wenhui qiu  a écrit :

> HI Sébastien
>  You can check out the email subject:Trigger more frequent autovacuums
> of heavy insert tables , I think it can alleviate the problem
>
> Thanks
>
> On Sat, Feb 15, 2025 at 3:13 AM Andres Freund  wrote:
>
>> Hi,
>>
>> On 2025-02-13 10:52:31 +0100, Sébastien wrote:
>> > Introduce an INSERT FROZEN feature to bypass vacuum processing for
>> > large-scale cold data imports, reducing the impact on system performance
>> > post-import. For large imports, migrations and major version upgrades.
>> > Business Use-case:
>> >
>> > When importing massive datasets (e.g., 6-10TB) into PostgreSQL on a
>> heavily
>> > loaded server, we observed that the system struggled significantly weeks
>> > later when the autovacuum process had to freeze all the imported data
>> > pages. This led to severe performance degradation, requiring manual
>> > intervention to prioritize vacuum jobs to complete them as quickly as
>> > possible.
>>
>> What version of postgres was this?  What batch sizes do you need to
>> support?
>> I.e. is all of this data inserted at once, or in steps?
>>
>> As already discussed, it seems unlikely that we'll ever support INSERT
>> FROZEN,
>> due to the potential of causing concurrent queries to give bogus answers.
>> But
>> there's actually a lot we can do to improve this short of INSERT FROZEN.
>>
>> The reason I asked for the version is that the behaviour would e.g.
>> likely be
>> worse before autovacuum_vacuum_insert_scale_factor existed. We are
>> working on
>> improvements around that in 18 too, ensuring that the gap between insert
>> triggered vacuums does not grow forever.
>>
>> Several recent releases have also improved the situation around this in
>> other
>> ways, e.g. by just making vacuuming faster and by avoiding doing redundant
>> work in more cases (by increasing relfrozenzid more aggressively).
>>
>> We've also been talking about performing freezing during e.g.
>> checkpoints, if
>> possible.
>>
>> If you're inserting all the data in a single transaction however, it'll be
>> hard to improve most of this, because while that long long transaction
>> runs,
>> we can't do anything that needs to know the transaction has finished.
>> OTOH,
>> if it were a single transaction, you could already use COPY FREEZE.
>>
>>
>> A somewhat related issue is that bulk INSERTs, in contrast to COPY,
>> currently
>> does not use the bulk-insert logic, leading the INSERT to cause a lot
>> more WAL
>> to be emitted compared to inserting the same data via COPY.
>>
>>
>> > This issue is particularly critical during database *migrations* or
>> *version
>> > upgrades*, where a full data reload is often necessary. Each time a
>> major
>> > PostgreSQL upgrade occurs, users must reimport large datasets, leading
>> to
>> > the same problem of vacuum storms post-import. An INSERT FROZEN feature
>> > would allow importing data that is known to be immutable, preventing
>> > unnecessary vacuum overhead and reducing system strain.
>>
>> What are you using for such upgrades or migrations? I'd not expect INSERT
>> to
>> be used, due to the overhead that has compared to COPY.
>>
>> Greetings,
>>
>> Andres Freund
>>
>>
>>

-- 
Sébastien Caunes
+33 6 7 229 229 7


Re: [Feature Request] INSERT FROZEN to Optimize Large Cold Data Imports and Migrations

2025-02-18 Thread Sébastien
Sorry it won't work. It just delays the problem. But still the freeze
procedure must rewrite all pages.

Le mar. 18 févr. 2025, 04:12, wenhui qiu  a écrit :

> Hello Sébastien
>   this case can be solved by xid64,but it seems like very few people
> are interested.But it seems to me that xid64 should be implemented as soon
> as possible.
>
>
> Thanks
>
> On Mon, Feb 17, 2025 at 9:47 PM Sébastien  wrote:
>
>> Hello weuhui,
>>
>> It's not a problem with heavy insert table but heavy delete. Also
>> triggering more frequent autovacuum will not help because autovacuum does
>> not delete recently dead tuples when a large and slow vacuum freeze
>> operation older than their delete is still running in parallel. The
>> solution was to increase the priority and speed of the vaccum freeze
>> opeartion.
>>
>> Anyway, there should be a way to insert freeze data other than copy that
>> does not work with foreign tables. (INSERT into my_table select * from
>> foreign_table)
>>
>>
>>
>> Le lun. 17 févr. 2025 à 09:46, wenhui qiu  a
>> écrit :
>>
>>> HI Sébastien
>>>  You can check out the email subject:Trigger more frequent
>>> autovacuums of heavy insert tables , I think it can alleviate the problem
>>>
>>> Thanks
>>>
>>> On Sat, Feb 15, 2025 at 3:13 AM Andres Freund 
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> On 2025-02-13 10:52:31 +0100, Sébastien wrote:
>>>> > Introduce an INSERT FROZEN feature to bypass vacuum processing for
>>>> > large-scale cold data imports, reducing the impact on system
>>>> performance
>>>> > post-import. For large imports, migrations and major version upgrades.
>>>> > Business Use-case:
>>>> >
>>>> > When importing massive datasets (e.g., 6-10TB) into PostgreSQL on a
>>>> heavily
>>>> > loaded server, we observed that the system struggled significantly
>>>> weeks
>>>> > later when the autovacuum process had to freeze all the imported data
>>>> > pages. This led to severe performance degradation, requiring manual
>>>> > intervention to prioritize vacuum jobs to complete them as quickly as
>>>> > possible.
>>>>
>>>> What version of postgres was this?  What batch sizes do you need to
>>>> support?
>>>> I.e. is all of this data inserted at once, or in steps?
>>>>
>>>> As already discussed, it seems unlikely that we'll ever support INSERT
>>>> FROZEN,
>>>> due to the potential of causing concurrent queries to give bogus
>>>> answers. But
>>>> there's actually a lot we can do to improve this short of INSERT FROZEN.
>>>>
>>>> The reason I asked for the version is that the behaviour would e.g.
>>>> likely be
>>>> worse before autovacuum_vacuum_insert_scale_factor existed. We are
>>>> working on
>>>> improvements around that in 18 too, ensuring that the gap between insert
>>>> triggered vacuums does not grow forever.
>>>>
>>>> Several recent releases have also improved the situation around this in
>>>> other
>>>> ways, e.g. by just making vacuuming faster and by avoiding doing
>>>> redundant
>>>> work in more cases (by increasing relfrozenzid more aggressively).
>>>>
>>>> We've also been talking about performing freezing during e.g.
>>>> checkpoints, if
>>>> possible.
>>>>
>>>> If you're inserting all the data in a single transaction however, it'll
>>>> be
>>>> hard to improve most of this, because while that long long transaction
>>>> runs,
>>>> we can't do anything that needs to know the transaction has finished.
>>>> OTOH,
>>>> if it were a single transaction, you could already use COPY FREEZE.
>>>>
>>>>
>>>> A somewhat related issue is that bulk INSERTs, in contrast to COPY,
>>>> currently
>>>> does not use the bulk-insert logic, leading the INSERT to cause a lot
>>>> more WAL
>>>> to be emitted compared to inserting the same data via COPY.
>>>>
>>>>
>>>> > This issue is particularly critical during database *migrations* or
>>>> *version
>>>> > upgrades*, where a full data reload is often necessary. Each time a
>>>> major
>>>> > PostgreSQL upgrade occurs, users must reimport large datasets,
>>>> leading to
>>>> > the same problem of vacuum storms post-import. An INSERT FROZEN
>>>> feature
>>>> > would allow importing data that is known to be immutable, preventing
>>>> > unnecessary vacuum overhead and reducing system strain.
>>>>
>>>> What are you using for such upgrades or migrations? I'd not expect
>>>> INSERT to
>>>> be used, due to the overhead that has compared to COPY.
>>>>
>>>> Greetings,
>>>>
>>>> Andres Freund
>>>>
>>>>
>>>>
>>
>> --
>> Sébastien Caunes
>> +33 6 7 229 229 7
>>
>


Re: Timeline ID hexadecimal format

2023-02-24 Thread Sébastien Lardière

On 31/01/2023 20:16, Greg Stark wrote:

A hint or something just in
that case might be enough?


It seems to be a -1 ;

let's try to improve the documentation, with the attached patch

best regards,

--
Sébastien
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index be05a33205..7e26b51031 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -1332,7 +1332,8 @@ restore_command = 'cp /mnt/server/archivedir/%f %p'
 you like, add comments to a history file to record your own notes about
 how and why this particular timeline was created.  Such comments will be
 especially valuable when you have a thicket of different timelines as
-a result of experimentation.
+a result of experimentation. In both WAL segment file names and history files,
+the timeline ID number is expressed in hexadecimal.

 

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e5c41cc6c6..3b5d041d92 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4110,7 +4110,9 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
 current when the base backup was taken.  The
 value latest recovers
 to the latest timeline found in the archive, which is useful in
-a standby server.  latest is the default.
+a standby server. A numerical value expressed in hexadecimal must be
+prefixed with 0x, for example 0x11.
+latest is the default. 

 

diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml
index 343f0482a9..4ae8f2ebdd 100644
--- a/doc/src/sgml/ref/pg_waldump.sgml
+++ b/doc/src/sgml/ref/pg_waldump.sgml
@@ -215,7 +215,8 @@ PostgreSQL documentation

 Timeline from which to read WAL records. The default is to use the
 value in startseg, if that is specified; otherwise, the
-default is 1.
+default is 1. The value must be expressed in decimal, contrary to the hexadecimal 
+value given in WAL segment file names and history files.

   
  


Re: Timeline ID hexadecimal format

2023-03-03 Thread Sébastien Lardière

On 02/03/2023 09:12, Peter Eisentraut wrote:

On 24.02.23 17:27, Sébastien Lardière wrote:

diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index be05a33205..7e26b51031 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -1332,7 +1332,8 @@ restore_command = 'cp/mnt/server/archivedir/%f %p'
  you like, add comments to a history file to record your own 
notes about
  how and why this particular timeline was created.  Such 
comments will be
  especially valuable when you have a thicket of different 
timelines as

-    a result of experimentation.
+    a result of experimentation. In both WAL segment file names and 
history files,

+    the timeline ID number is expressed in hexadecimal.
 
   


I think here it would be more helpful to show actual examples. Like, 
here is a possible file name, this is what the different parts mean.


So you mean explain the WAL filename and the history filename ? Is it 
the good place for it ?






diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e5c41cc6c6..3b5d041d92 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4110,7 +4110,9 @@ restore_command = 'copy 
"C:\\server\\archivedir\\%f" "%p"'  # Windows

  current when the base backup was taken.  The
  value latest recovers
  to the latest timeline found in the archive, which is 
useful in

-    a standby server. latest is the default.
+    a standby server. A numerical value expressed in hexadecimal 
must be
+    prefixed with 0x, for example 
0x11.

+    latest is the default.
 
   


This applies to all configuration parameters, so it doesn't need to be 
mentioned explicitly for individual ones.


Probably, but is there another parameter with the same consequence ?

worth it to document this point globally ?




diff --git a/doc/src/sgml/ref/pg_waldump.sgml 
b/doc/src/sgml/ref/pg_waldump.sgml

index 343f0482a9..4ae8f2ebdd 100644
--- a/doc/src/sgml/ref/pg_waldump.sgml
+++ b/doc/src/sgml/ref/pg_waldump.sgml
@@ -215,7 +215,8 @@ PostgreSQL documentation
 
  Timeline from which to read WAL records. The default is to 
use the
  value in startseg, if that is 
specified; otherwise, the

-    default is 1.
+    default is 1. The value must be expressed in decimal, 
contrary to the hexadecimal

+    value given in WAL segment file names and history files.
 
    
   


Maybe this could be fixed instead?




Indeed, and strtoul is probably a better option than sscanf, don't you 
think ?




--
Sébastien





Re: Timeline ID hexadecimal format

2023-03-07 Thread Sébastien Lardière

On 06/03/2023 18:04, Peter Eisentraut wrote:

On 03.03.23 16:52, Sébastien Lardière wrote:

On 02/03/2023 09:12, Peter Eisentraut wrote:


I think here it would be more helpful to show actual examples. Like, 
here is a possible file name, this is what the different parts mean.


So you mean explain the WAL filename and the history filename ? Is it 
the good place for it ?


Well, your patch says, by the way, the timeline ID in the file is 
hexadecimal.  Then one might ask, what file, what is a timeline, what 
are the other numbers in the file, etc.  It seems very specific in 
this context.  I don't know if the format of these file names is 
actually documented somewhere.



Well, in the context of this patch, the usage both filename are 
explained juste before, so it seems understandable to me


Timelines are explained in this place : 
https://www.postgresql.org/docs/current/continuous-archiving.html#BACKUP-TIMELINES 
so the patch explains the format there









This applies to all configuration parameters, so it doesn't need to 
be mentioned explicitly for individual ones.


Probably, but is there another parameter with the same consequence ?

worth it to document this point globally ?


It's ok to mention it again.  We do something similar for example at 
unix_socket_permissions.  But maybe with more context, like "If you 
want to specify a timeline ID hexadecimal (for example, if extracted 
from a WAL file name), then prefix it with a 0x".



Ok, I've improved the message






Maybe this could be fixed instead?


Indeed, and strtoul is probably a better option than sscanf, don't 
you think ?


Yeah, the use of sscanf() is kind of weird here.  We have been moving 
the option parsing to use option_parse_int().  Maybe hex support could 
be added there.  Or just use strtoul().



I've made the change with strtoul

About option_parse_int(), actually, strtoint() is used, do we need a 
option_parse_ul() fonction ?


patch attached,

best regards,


--
Sébastien
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index be05a33205..fb86a3fec5 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -1332,7 +1332,8 @@ restore_command = 'cp /mnt/server/archivedir/%f %p'
 you like, add comments to a history file to record your own notes about
 how and why this particular timeline was created.  Such comments will be
 especially valuable when you have a thicket of different timelines as
-a result of experimentation.
+a result of experimentation. The timeline identifier is an integer which
+is used in hexadecimal format in both WAL segment file names and history files.

 

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e5c41cc6c6..6c0d63b73d 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4110,7 +4110,11 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
 current when the base backup was taken.  The
 value latest recovers
 to the latest timeline found in the archive, which is useful in
-a standby server.  latest is the default.
+a standby server. A numerical value expressed in hexadecimal, by exemple
+from WAL filename or history file, must be prefixed with 0x,
+for example 0x11 if the WAL filename
+is 001100A1004F.
+latest is the default.

 

diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml
index 343f0482a9..d92948c68a 100644
--- a/doc/src/sgml/ref/pg_waldump.sgml
+++ b/doc/src/sgml/ref/pg_waldump.sgml
@@ -215,7 +215,9 @@ PostgreSQL documentation

 Timeline from which to read WAL records. The default is to use the
 value in startseg, if that is specified; otherwise, the
-default is 1.
+default is 1. The value can be expressed in decimal or hexadecimal format.
+The hexadecimal format, given by WAL filename, must be preceded by 0x,
+by example 0x11.

   
  
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 44b5c8726e..b29d5223ce 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -770,6 +770,7 @@ usage(void)
 	printf(_("  -R, --relation=T/D/R   only show records that modify blocks in relation T/D/R\n"));
 	printf(_("  -s, --start=RECPTR start reading at WAL location RECPTR\n"));
 	printf(_("  -t, --timeline=TLI timeline from which to read WAL records\n"
+			 " hexadecimal value, from WAL filename, must be preceded by 0x\n"
 			 " (default: 1 or the value used in STARTSEG)\n"));
 	printf(_("  -V, --version  output version information, then exit\n"));
 	printf(_("  -w, --fullpage only show records wi

Re: Timeline ID hexadecimal format

2023-03-20 Thread Sébastien Lardière

On 20/03/2023 09:17, Peter Eisentraut wrote:
I have committed the two documentation changes, with some minor 
adjustments.



Thank you,




On 07.03.23 18:14, Sébastien Lardière wrote:

Maybe this could be fixed instead?


Indeed, and strtoul is probably a better option than sscanf, don't 
you think ?


Yeah, the use of sscanf() is kind of weird here.  We have been 
moving the option parsing to use option_parse_int().  Maybe hex 
support could be added there.  Or just use strtoul().



I've made the change with strtoul

About option_parse_int(), actually, strtoint() is used, do we need a 
option_parse_ul() fonction ?


For the option parsing, I propose the attached patch.  This follows 
the structure of option_parse_int(), so in the future it could be 
extracted and refactored in the same way, if there is more need.



ok for me, it accept 0x values and refuse wrong values

thank you,


--
Sébastien





Re: Timeline ID hexadecimal format

2023-03-21 Thread Sébastien Lardière

On 21/03/2023 08:15, Peter Eisentraut wrote:

On 20.03.23 10:40, Sébastien Lardière wrote:
About option_parse_int(), actually, strtoint() is used, do we need 
a option_parse_ul() fonction ?


For the option parsing, I propose the attached patch.  This follows 
the structure of option_parse_int(), so in the future it could be 
extracted and refactored in the same way, if there is more need.



ok for me, it accept 0x values and refuse wrong values


committed


thanks,


--
Sébastien





Re: planner chooses incremental but not the best one

2023-12-22 Thread Sébastien Lardière

On 15/12/2023 09:58, Richard Guo wrote:


On Thu, Dec 14, 2023 at 6:02 PM Richard Guo  
wrote:


It seems that we need to improve estimate of distinct values in
estimate_num_groups() when taking the selectivity of restrictions into
account.

In 84f9a35e3 we changed to a new formula to perform such estimation.
But that does not apply to the case here, because for an appendrel,
set_append_rel_size() always sets "raw tuples" count equal to "rows",
and that would make estimate_num_groups() skip the adjustment of the
estimate using the new formula.


I'm wondering why we set the appendrel's 'tuples' equal to its 'rows'.
Why don't we set it to the accumulated estimate of tuples from each live
child, like attached?  I believe this aligns more closely with reality.

And this would also allow us to adjust the estimate for the number of
distinct values in estimate_num_groups() for appendrels using the new
formula introduced in 84f9a35e3.  As I experimented, this can improve
the estimate for appendrels.  For instance,

create table t (a int, b int, c float) partition by range(a);
create table tp1 partition of t for values from (0) to (1000);
create table tp2 partition of t for values from (1000) to (2000);

insert into t select i%2000, (10 * random())::int, random() from 
generate_series(1,100) i;

analyze t;

explain analyze select b from t where c < 0.1 group by b;

-- on master
 HashAggregate  (cost=18659.28..19598.74 rows=93946 width=4)
                (actual time=220.760..234.439 rows=63224 loops=1)

-- on patched
 HashAggregate  (cost=18659.28..19294.25 rows=63497 width=4)
                (actual time=235.161..250.023 rows=63224 loops=1)

With the patch the estimate for the number of distinct 'b' values is
more accurate.

BTW, this patch does not change any existing regression test results.  I
attempted to devise a regression test that shows how this change can
improve query plans, but failed.  Should I try harder to find such a
test case?



Hi,

thank you for the patch ; I've tried it and it works with the scenario 
you provide.


As Nicolas's co-worker, I've been involved in this case, but, 
unfortunately, we're not able to test the patch with the actual data for 
the moment, but I'll ask a dump to the real owner.


About the regression test, I don't know how to implement it either.

best regards,

--
Sébastien


Timeline ID hexadecimal format

2023-01-27 Thread Sébastien Lardière

Hi,

I've been puzzled by this message:

~~~
LOG:  fetching timeline history file for timeline 17 from primary server
FATAL:  could not receive timeline history file from the primary server: 
ERROR:  could not open file "pg_xlog/0011.history": No such file or 
directory

~~~

It took me a while to understand that the timeline id 11 in hexadecimal 
is the same as the timeline id 17 in decimal.


It appears that the first message is formatted with %u instead of %X, 
and there some others places with the some format, while WAL filename 
and history file used hexadecimal.


There is another place where timeline id is used : pg_waldump, and in 
these tools, timeline id ( -t or --timeline ) should be given in 
decimal, while filename gives it in hexadecimal : imho, it's not 
user-friendly, and can lead to user's bad input for timeline id.


The attached patch proposes to change the format of timelineid from %u 
to %X.


Regarding .po files, I don't know how to manage them. Is there any 
routine to spread the modifications? Or should I identify and change 
each message?



best regards,

--
Sébastien
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index f390c177e4..77b0fbc962 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -45,7 +45,7 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		CheckPoint *checkpoint = (CheckPoint *) rec;
 
 		appendStringInfo(buf, "redo %X/%X; "
-		 "tli %u; prev tli %u; fpw %s; xid %u:%u; oid %u; multi %u; offset %u; "
+		 "tli %X; prev tli %X; fpw %s; xid %u:%u; oid %u; multi %u; offset %u; "
 		 "oldest xid %u in DB %u; oldest multi %u in DB %u; "
 		 "oldest/newest commit timestamp xid: %u/%u; "
 		 "oldest running xid %u; %s",
@@ -135,7 +135,7 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		xl_end_of_recovery xlrec;
 
 		memcpy(&xlrec, rec, sizeof(xl_end_of_recovery));
-		appendStringInfo(buf, "tli %u; prev tli %u; time %s",
+		appendStringInfo(buf, "tli %X; prev tli %X; time %s",
 		 xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
 		 timestamptz_to_str(xlrec.end_time));
 	}
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index aa6c929477..97ff6e80b6 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -1329,7 +1329,7 @@ XLogReaderValidatePageHeader(XLogReaderState *state, XLogRecPtr recptr,
 			XLogFileName(fname, state->seg.ws_tli, segno, state->segcxt.ws_segsize);
 
 			report_invalid_record(state,
-  "out-of-sequence timeline ID %u (after %u) in WAL segment %s, LSN %X/%X, offset %u",
+  "out-of-sequence timeline ID %X (after %X) in WAL segment %s, LSN %X/%X, offset %u",
   hdr->xlp_tli,
   state->latestPageTLI,
   fname,
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 2a5352f879..c814ef2767 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -3080,7 +3080,7 @@ ReadRecord(XLogPrefetcher *xlogprefetcher, int emode,
 			XLogFileName(fname, xlogreader->seg.ws_tli, segno,
 		 wal_segment_size);
 			ereport(emode_for_corrupt_record(emode, xlogreader->EndRecPtr),
-	(errmsg("unexpected timeline ID %u in WAL segment %s, LSN %X/%X, offset %u",
+	(errmsg("unexpected timeline ID %X in WAL segment %s, LSN %X/%X, offset %u",
 			xlogreader->latestPageTLI,
 			fname,
 			LSN_FORMAT_ARGS(xlogreader->latestPagePtr),
@@ -3719,7 +3719,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 			tli = tliOfPointInHistory(tliRecPtr, expectedTLEs);
 
 			if (curFileTLI > 0 && tli < curFileTLI)
-elog(ERROR, "according to history file, WAL location %X/%X belongs to timeline %u, but previous recovered WAL file came from timeline %u",
+elog(ERROR, "according to history file, WAL location %X/%X belongs to timeline %X, but previous recovered WAL file came from timeline %X",
 	 LSN_FORMAT_ARGS(tliRecPtr),
 	 tli, curFileTLI);
 		}
@@ -4019,7 +4019,7 @@ rescanLatestTimeLine(TimeLineID replayTLI, XLogRecPtr replayLSN)
 	if (!found)
 	{
 		ereport(LOG,
-(errmsg("new timeline %u is not a child of database system timeline %u",
+(errmsg("new timeline %X is not a child of database system timeline %X",
 		newtarget,
 		replayTLI)));
 		return false;
@@ -4033,7 +4033,7 @@ rescanLatestTimeLine(TimeLineID replayTLI, XLogRecPtr replayLSN)
 	if (currentTle->end < replayLSN)
 	{
 		ereport(LOG,
-(errmsg("new timeline %u forked off current database system timeline %u before current recovery point %X/%X&q

Re: Timeline ID hexadecimal format

2023-01-27 Thread Sébastien Lardière

On 27/01/2023 15:55, Peter Eisentraut wrote:

On 27.01.23 14:52, Sébastien Lardière wrote:
The attached patch proposes to change the format of timelineid from 
%u to %X.


I think your complaint has merit.  But note that if we did a change 
like this, then log files or reports from different versions would 
have different meaning without a visual difference, which is kind of 
what you complained about in the first place.  At least we should do 
something like 0x%X.


Indeed, but the messages that puzzled was in one log file, just 
together, not in some differents versions.


But yes, it should be documented somewhere, actually, I can't find any 
good place for that,


While digging, It seems that recovery_target_timeline should be given in 
decimal, not in hexadecimal, which seems odd to me ; and pg_controldata 
use decimal too, not hexadecimal…


So, if this idea is correct, the given patch is not enough.

Anyway, do you think it is a good idea or not ?




Regarding .po files, I don't know how to manage them. Is there any 
routine to spread the modifications? Or should I identify and change 
each message?


Don't worry about this.  This is handled elsewhere.



nice,


regards,


--
Sébastien





Re: Timeline ID hexadecimal format

2023-01-30 Thread Sébastien Lardière

On 27/01/2023 15:55, Peter Eisentraut wrote:

On 27.01.23 14:52, Sébastien Lardière wrote:
The attached patch proposes to change the format of timelineid from 
%u to %X.


I think your complaint has merit.  But note that if we did a change 
like this, then log files or reports from different versions would 
have different meaning without a visual difference, which is kind of 
what you complained about in the first place.  At least we should do 
something like 0x%X.



Hi,

Here's the patch with the suggested format ; plus, I add some note in 
the documentation about recovery_target_timeline, because I don't get 
how strtoul(), with the special 0 base parameter can work without 0x 
prefix ; I suppose that nobody use it.


I also change pg_controldata and the usage of this output by pg_upgrade. 
I let internal usages unchanded : content of backup manifest and content 
of history file.


Should I open a commitfest entry, or is it too soon ?

regards,

--
Sébastien
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index be05a33205..7e26b51031 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -1332,7 +1332,8 @@ restore_command = 'cp /mnt/server/archivedir/%f %p'
 you like, add comments to a history file to record your own notes about
 how and why this particular timeline was created.  Such comments will be
 especially valuable when you have a thicket of different timelines as
-a result of experimentation.
+a result of experimentation. In both WAL segment file names and history files,
+the timeline ID number is expressed in hexadecimal.

 

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 1cf53c74ea..508774cfee 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4110,7 +4110,9 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
 current when the base backup was taken.  The
 value latest recovers
 to the latest timeline found in the archive, which is useful in
-a standby server.  latest is the default.
+a standby server. A numerical value expressed in hexadecimal must be
+prefixed with 0x, for example 0x11.
+latest is the default.

 

diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index f390c177e4..bdbe993877 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -45,7 +45,7 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		CheckPoint *checkpoint = (CheckPoint *) rec;
 
 		appendStringInfo(buf, "redo %X/%X; "
-		 "tli %u; prev tli %u; fpw %s; xid %u:%u; oid %u; multi %u; offset %u; "
+		 "tli 0x%X; prev tli 0x%X; fpw %s; xid %u:%u; oid %u; multi %u; offset %u; "
 		 "oldest xid %u in DB %u; oldest multi %u in DB %u; "
 		 "oldest/newest commit timestamp xid: %u/%u; "
 		 "oldest running xid %u; %s",
@@ -135,7 +135,7 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		xl_end_of_recovery xlrec;
 
 		memcpy(&xlrec, rec, sizeof(xl_end_of_recovery));
-		appendStringInfo(buf, "tli %u; prev tli %u; time %s",
+		appendStringInfo(buf, "tli 0x%X; prev tli 0x%X; time %s",
 		 xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
 		 timestamptz_to_str(xlrec.end_time));
 	}
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index fb4c860bde..c22cf4b2a1 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7819,7 +7819,7 @@ xlog_redo(XLogReaderState *record)
 		(void) GetCurrentReplayRecPtr(&replayTLI);
 		if (checkPoint.ThisTimeLineID != replayTLI)
 			ereport(PANIC,
-	(errmsg("unexpected timeline ID %u (should be %u) in shutdown checkpoint record",
+	(errmsg("unexpected timeline ID 0x%X (should be 0x%X) in shutdown checkpoint record",
 			checkPoint.ThisTimeLineID, replayTLI)));
 
 		RecoveryRestartPoint(&checkPoint, record);
@@ -7906,7 +7906,7 @@ xlog_redo(XLogReaderState *record)
 		(void) GetCurrentReplayRecPtr(&replayTLI);
 		if (xlrec.ThisTimeLineID != replayTLI)
 			ereport(PANIC,
-	(errmsg("unexpected timeline ID %u (should be %u) in end-of-recovery record",
+	(errmsg("unexpected timeline ID 0x%X (should be 0x%X) in end-of-recovery record",
 	

Re: Timeline ID hexadecimal format

2023-01-31 Thread Sébastien Lardière

On 31/01/2023 12:26, Ashutosh Bapat wrote:

On Mon, Jan 30, 2023 at 9:35 PM Sébastien Lardière
 wrote:

On 27/01/2023 15:55, Peter Eisentraut wrote:

On 27.01.23 14:52, Sébastien Lardière wrote:

The attached patch proposes to change the format of timelineid from
%u to %X.

I think your complaint has merit.  But note that if we did a change
like this, then log files or reports from different versions would
have different meaning without a visual difference, which is kind of
what you complained about in the first place.  At least we should do
something like 0x%X.


Hi,

Here's the patch with the suggested format ; plus, I add some note in
the documentation about recovery_target_timeline, because I don't get
how strtoul(), with the special 0 base parameter can work without 0x
prefix ; I suppose that nobody use it.

I also change pg_controldata and the usage of this output by pg_upgrade.
I let internal usages unchanded : content of backup manifest and content
of history file.

The patch seems to have some special/unprintable characters in it. I
see a lot ^[[ in there. I can't read the patch because of that.

Sorry for that, it was the --color from git diff, it's fixed, I hope, 
thank you


regards,

--
Sébastien
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index be05a33205..7e26b51031 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -1332,7 +1332,8 @@ restore_command = 'cp /mnt/server/archivedir/%f %p'
 you like, add comments to a history file to record your own notes about
 how and why this particular timeline was created.  Such comments will be
 especially valuable when you have a thicket of different timelines as
-a result of experimentation.
+a result of experimentation. In both WAL segment file names and history files,
+the timeline ID number is expressed in hexadecimal.

 

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 1cf53c74ea..508774cfee 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4110,7 +4110,9 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
 current when the base backup was taken.  The
 value latest recovers
 to the latest timeline found in the archive, which is useful in
-a standby server.  latest is the default.
+a standby server. A numerical value expressed in hexadecimal must be
+prefixed with 0x, for example 0x11.
+latest is the default.

 

diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index f390c177e4..bdbe993877 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -45,7 +45,7 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		CheckPoint *checkpoint = (CheckPoint *) rec;
 
 		appendStringInfo(buf, "redo %X/%X; "
-		 "tli %u; prev tli %u; fpw %s; xid %u:%u; oid %u; multi %u; offset %u; "
+		 "tli 0x%X; prev tli 0x%X; fpw %s; xid %u:%u; oid %u; multi %u; offset %u; "
 		 "oldest xid %u in DB %u; oldest multi %u in DB %u; "
 		 "oldest/newest commit timestamp xid: %u/%u; "
 		 "oldest running xid %u; %s",
@@ -135,7 +135,7 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		xl_end_of_recovery xlrec;
 
 		memcpy(&xlrec, rec, sizeof(xl_end_of_recovery));
-		appendStringInfo(buf, "tli %u; prev tli %u; time %s",
+		appendStringInfo(buf, "tli 0x%X; prev tli 0x%X; time %s",
 		 xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
 		 timestamptz_to_str(xlrec.end_time));
 	}
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index fb4c860bde..c22cf4b2a1 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7819,7 +7819,7 @@ xlog_redo(XLogReaderState *record)
 		(void) GetCurrentReplayRecPtr(&replayTLI);
 		if (checkPoint.ThisTimeLineID != replayTLI)
 			ereport(PANIC,
-	(errmsg("unexpected timeline ID %u (should be %u) in shutdown checkpoint record",
+	(errmsg("unexpected timeline ID 0x%X (should be 0x%X) in shutdown checkpoint record",
 			checkPoint.ThisTimeLineID, replayTLI)));
 
 		RecoveryRestartPoint(&checkPoint, record);
@@ -7906,7 +7906,7 @@ xlog_redo(XLogReaderState *record)
 		(void) GetCurrentReplayRecPtr(&replayTLI);
 		if (xlrec.ThisTimeLineID != replayTLI)
 			ereport(PANIC,
-	(errmsg("unexpected timeline ID %u (should be %u) in end-of-recovery record",
+	(errmsg("unexpected timeline ID 0x%X (should be 0x%X) in end-of-recovery record",
 			xlrec.ThisTimeLineID, replayTLI)));
 	}
 	else if (info == XLOG_NOOP)
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index aa6c929477..1643d0d98c 100644
--- a/src/backend/a

Re: Timeline ID hexadecimal format

2023-01-31 Thread Sébastien Lardière

On 31/01/2023 10:53, Peter Eisentraut wrote:

On 30.01.23 17:05, Sébastien Lardière wrote:


Here's the patch with the suggested format ; plus, I add some note in 
the documentation about recovery_target_timeline, because I don't get 
how strtoul(), with the special 0 base parameter can work without 0x 
prefix ; I suppose that nobody use it.


I also change pg_controldata and the usage of this output by 
pg_upgrade. I let internal usages unchanded : content of backup 
manifest and content of history file.


Should I open a commitfest entry, or is it too soon ?


It is not too soon.  (The next commitfest is open for new patch 
submissions as soon as the current one is "in progress", which closes 
it for new patches.)



Done : https://commitfest.postgresql.org/42/4155/


--
Sébastien





Re: Timeline ID hexadecimal format

2023-02-01 Thread Sébastien Lardière

On 31/01/2023 20:16, Greg Stark wrote:

The fact that the *filename* has it encoded in hex is an
implementation detail and really gets exposed here because it's giving
you the underlying system error that caused the problem.



It's an implementation detail, but an exposed detail, so, people refer 
to the filename to find the timeline ID (That's why it happened to me)




  The confusion
only arises when the two are juxtaposed. A hint or something just in
that case might be enough?




Thanks, i got your point.

 Note that my proposal was to remove the ambiguous notation which 
happen in some case (as in 11 <-> 17). A hint is useless in most of the 
case, because there is no ambiguous. That's why i though format 
hexadecimal everywhere.



At least, can I propose to improve the documentation to expose the fact 
that the timeline ID is exposed in hexadecimal in filenames but must be 
used in decimal in recovery_target_timeline and pg_waldump ?



regards,

--
Sébastien





Re: PostgreSQL 13 Beta 1 Release Announcement Draft

2020-06-26 Thread Sébastien Lardière
On 19/05/2020 04:29, Jonathan S. Katz wrote:
> Hi,
>
> Attached is a draft of the release announcement for the PostgreSQL 13
> Beta 1 release this week.
>
>
Hi,

Maybe I'm too late, but in this paragraph :


> `psql` now includes the `\warn` command that is similar to the `\echo`
command
> in terms of outputting data, except `\warn` sends it to stderr. And in
case you
> need additional guidance on any of the PostgreSQL commands, the
`--help` flag
> now includes a link to
[https://www.postgresql.org](https://www.postgresql.org).

is it --help shouldn't be /help ?

Same thing in the release note
(https://www.postgresql.org/docs/13/release-13.html) :

> Add the PostgreSQL home page to command-line |--help| output (Peter
Eisentraut)

as it probalbly refer to 27f3dea64833d68c1fa08c1e5d26176a579f69c8, isn't
it ?

regards,

-- 
Sébastien