On Tue, Jul 22, 2025 at 1:24 PM Nikita Malakhov <huku...@gmail.com> wrote: > > Hi Michael! > > Yes, I know about relation rewrite and have already thought about how > we can avoid excessive storage of toastrelid and do not spoil rewrite, > still do not have a good enough solution.
The high-level idea would be to any actual rewrite -- as opposed to plain vacuum which frees empty space within the TOAST relation -- as part of the vacuum of the main relation. Another option would be to store a back-pointer to the heap tuple inside the toast tuple and use that when rewriting, though it has its own set of complexities. > > You have some interesting points around > > detoast_external_attr() and detoast_attr_slice(), as far as I can see. > > One point of the on-disk TOAST refactoring is that we should be able > > to entirely avoid this level of redirection. I get that this is a > > POC, of course, but it provides pointers that what I've done may not > > be sufficient in terms of extensibility so that seems worth digging > > into. > > I'm currently re-visiting our TOAST API patch set, there are some > good (in terms of simplicity and lightweightness) proposals, will mail > later. Sounds interesting. > Some more thoughts on TIDs: > TIDs could be stored as a list instead of a chain (as Hannu proposes > in his design). This allows batch operations and storage optimization > 'cause TID lists are highly compressible, but significantly complicates > the code responsible for chunk processing. I would not say it complicates the *code* very much, especially when you keep offsets in the toast tuples so that you can copy them into the final materialized datum in any order. And it does allow many optimisations in terms of batching, pre-fetching and even parallelism in case of huge toasted values. > Also, Toast pointer in current state must store raw size and external > size - these two are used by the executor, and we cannot get rid > of them so lightly. Are these ever used without actually using the data ? When the data _is_ also used then the cost of getting the length from the toast record with direct toast should mostly amortize over the full query. Can you point to where in the code this is done ? In long run we may want to store also the actual size in the toast record (not toast pointer) as well for types where length() != octetsize bacuse currently a simple call like length(text) has to materialize the whole thing before getting the length, whereas pg_colum_size() and octertsize() are instantaneous. > Vacuuming such a table would be a pain in the ass, we have to > somehow prevent bloating tables with a high update rate. Normal Vacuum should work fine. It is the rewrite that cis tricky. > Also, current toast mechanics is insert-only, it does not support > updates (just to remind - the whole toasted value is marked dead > and new one is inserted during update), this is a subject to change. > And logical replication, as I mentioned earlier, does not have any > means for replicating toast diffs. Which points to the need to (optionally) store the diff in the toast as well when there are defined replication slots. Once we have a way to actually do JSON(B) updates at SQL or function level. We may even want to store the JSON in some base JSON + JSON_PATCH format where we materialize at retrieval. But this goes way beyond the current patch's scope. Though my design should accommodate it nicely. --- Hannu