On Wed, 29 Jun 2022 at 14:41, Simon Riggs <simon.ri...@enterprisedb.com> wrote: > > On Tue, 28 Jun 2022 at 19:18, Matthias van de Meent > <boekewurm+postg...@gmail.com> wrote: > > > I will be the first to admit that it is quite unlikely to be common > > practise, but this workload increases the number of dbOid+spcOid > > combinations to 100s (even while using only a single tablespace), > > Which should still fit nicely in 32bits then. Why does that present a > problem to this idea?
It doesn't, or at least not the bitspace part. I think it is indeed quite unlikely anyone will try to build as many tablespaces as the 100 million tables project, which utilized 1000 tablespaces to get around file system limitations [0]. The potential problem is 'where to store such mapping efficiently'. Especially considering that this mapping might (and likely: will) change across restarts and when database churn (create + drop database) happens in e.g. testing workloads. > The reason to mention this now is that it would give more space than > 56bit limit being suggested here. I am not opposed to the current > patch, just finding ways to remove some objections mentioned by > others, if those became blockers. > > > which in my opinion requires some more thought than just handwaving it > > into an smgr array and/or checkpoint records. > > The idea is that we would store the mapping as an array, with the > value in the RelFileNode as the offset in the array. The array would > be mostly static, so would cache nicely. That part is not quite clear to me. Any cluster may have anywhere between 3 and hundreds or thousands of entries in that mapping. Do you suggest to dynamically grow that (presumably shared, considering the addressing is shared) array, or have a runtime parameter limiting the amount of those entries (similar to max_connections)? > For convenience, I imagine that the mapping could be included in WAL > in or near the checkpoint record, to ensure that the mapping was > available in all backups. Why would we need this mapping in backups, considering that it seems to be transient state that is lost on restart? Won't we still use full dbOid and spcOid in anything we communicate or store on disk (file names, WAL, pg_class rows, etc.), or did I misunderstand your proposal? Kind regards, Matthias van de Meent [0] https://www.pgcon.org/2013/schedule/attachments/283_Billion_Tables_Project-PgCon2013.pdf