Re: Statistics Import and Export

Jeff Davis Tue, 01 Apr 2025 15:06:22 -0700

On Tue, 2025-04-01 at 13:44 -0500, Nathan Bossart wrote:
> Apologies for the noise.  I noticed one more way to simplify 0002. 
> As
> before, there should be no functional differences.



To restate the problem: one of the problems being solved here is that
the existing code for custom-format dumps calls WriteToc twice. That
was not a big problem before this patch, when the contents of the
entries was easily accessible in memory. But the point of 0002 is to
avoid keeping all of the stats in memory at once, because that causes
bloat; and instead to query it on demand.

In theory, we could fix the pre-existing code by making the second pass
able to jump over the other contents of the entry and just update the
data offsets. But that seems invasive, at least to do it properly.

0001 sidesteps the problem by skipping the second pass if data's not
being dumped (because there are no offsets that need updating). The
worst case is when there are a lot of objects with a small amount of
data. But that's a worst case for stats in general, so I don't think
that needs to be solved here.

Issuing the stats queries twice is not great, though. If there's any
non-deterministic output in the query, that could lead to strangeness.
How bad can that be? If the results change in some way that looks
benign, but changes the length of the definition string, can it lead to
corruption of a ToC entry? I'm not saying there's a problem, but trying
to understand the risk of future problems.

For 0003, it makes an assumption about the way the scan happens in
WriteToc(). Can you add some additional sanity checks to verify that
something doesn't happen in a different order than we expect?

Also, why do we need the clause "WHERE s.tablename = ANY($2)"? Isn't
that already implied by "JOIN unnest($1, $2) ... s.tablename =
u.tablename"?

Regards,
        Jeff Davis

Re: Statistics Import and Export

Reply via email to