Re: making relfilenodes 56 bits

Matthias van de Meent Wed, 29 Jun 2022 15:13:12 -0700

On Wed, 29 Jun 2022 at 14:41, Simon Riggs <[email protected]> wrote:
>
> On Tue, 28 Jun 2022 at 19:18, Matthias van de Meent
> <[email protected]> wrote:
>
> > I will be the first to admit that it is quite unlikely to be common
> > practise, but this workload increases the number of dbOid+spcOid
> > combinations to 100s (even while using only a single tablespace),
>
> Which should still fit nicely in 32bits then. Why does that present a
> problem to this idea?


It doesn't, or at least not the bitspace part. I think it is indeed
quite unlikely anyone will try to build as many tablespaces as the 100
million tables project, which utilized 1000 tablespaces to get around
file system limitations [0].

The potential problem is 'where to store such mapping efficiently'.
Especially considering that this mapping might (and likely: will)
change across restarts and when database churn (create + drop
database) happens in e.g. testing workloads.

> The reason to mention this now is that it would give more space than
> 56bit limit being suggested here. I am not opposed to the current
> patch, just finding ways to remove some objections mentioned by
> others, if those became blockers.
>
> > which in my opinion requires some more thought than just handwaving it
> > into an smgr array and/or checkpoint records.
>
> The idea is that we would store the mapping as an array, with the
> value in the RelFileNode as the offset in the array. The array would
> be mostly static, so would cache nicely.

That part is not quite clear to me. Any cluster may have anywhere
between 3 and hundreds or thousands of entries in that mapping. Do you
suggest to dynamically grow that (presumably shared, considering the
addressing is shared) array, or have a runtime parameter limiting the
amount of those entries (similar to max_connections)?

> For convenience, I imagine that the mapping could be included in WAL
> in or near the checkpoint record, to ensure that the mapping was
> available in all backups.

Why would we need this mapping in backups, considering that it seems
to be transient state that is lost on restart? Won't we still use full
dbOid and spcOid in anything we communicate or store on disk (file
names, WAL, pg_class rows, etc.), or did I misunderstand your
proposal?

Kind regards,

Matthias van de Meent


[0] 
https://www.pgcon.org/2013/schedule/attachments/283_Billion_Tables_Project-PgCon2013.pdf

Re: making relfilenodes 56 bits

Reply via email to