On Tue, 2025-04-01 at 13:44 -0500, Nathan Bossart wrote: > Apologies for the noise. I noticed one more way to simplify 0002. > As > before, there should be no functional differences.
To restate the problem: one of the problems being solved here is that the existing code for custom-format dumps calls WriteToc twice. That was not a big problem before this patch, when the contents of the entries was easily accessible in memory. But the point of 0002 is to avoid keeping all of the stats in memory at once, because that causes bloat; and instead to query it on demand. In theory, we could fix the pre-existing code by making the second pass able to jump over the other contents of the entry and just update the data offsets. But that seems invasive, at least to do it properly. 0001 sidesteps the problem by skipping the second pass if data's not being dumped (because there are no offsets that need updating). The worst case is when there are a lot of objects with a small amount of data. But that's a worst case for stats in general, so I don't think that needs to be solved here. Issuing the stats queries twice is not great, though. If there's any non-deterministic output in the query, that could lead to strangeness. How bad can that be? If the results change in some way that looks benign, but changes the length of the definition string, can it lead to corruption of a ToC entry? I'm not saying there's a problem, but trying to understand the risk of future problems. For 0003, it makes an assumption about the way the scan happens in WriteToc(). Can you add some additional sanity checks to verify that something doesn't happen in a different order than we expect? Also, why do we need the clause "WHERE s.tablename = ANY($2)"? Isn't that already implied by "JOIN unnest($1, $2) ... s.tablename = u.tablename"? Regards, Jeff Davis