Tomas Vondra <tomas.von...@2ndquadrant.com> writes: > On Wed, Jun 26, 2019 at 11:26:21AM -0400, Tom Lane wrote: >> #define SizeOfMCVList(ndims,nitems) \ >> is both woefully underdocumented and completely at variance with >> reality. It doesn't seem to be accounting for the actual data values.
> I agree about the macro being underdocumented, but AFAICS it's used > correctly to check the expected length. It can't include the data values > directly, because that's variable amount of data - and it's encoded in not > yet verified part of the data. Well, it should have some other name then. Or *at least* a comment. It's unbelievably misleading as it stands. > That being said, maybe this is unnecessarily defensive and we should just > trust the values not being corrupted. No, I'm on board with checking the lengths. I just don't like how hard it is to discern what's being checked. >> I think that part of the problem here is that the way this code is >> written, "maxaligned" is no such thing. What you're actually maxaligning >> seems to be the offset from the start of the data area of a varlena value, > I don't think so. The pointers should be maxaligned with respect to the > whole varlena value, which is what 'raw' points to. [ squint ... ] OK, I think I misread this: statext_mcv_deserialize(bytea *data) { ... /* pointer to the data part (skip the varlena header) */ ptr = VARDATA_ANY(data); raw = (char *) data; I think this is confusing in itself --- I read it as "raw = (char *) ptr" and I think most other people would assume that too based on the order of operations. It'd read better as /* remember start of datum for maxalign reference */ raw = (char *) data; /* pointer to the data part (skip the varlena header) */ ptr = VARDATA_ANY(data); Another problem with this code is that it flat doesn't work for non-4-byte-header varlenas: it'd do the alignment differently than the serialization side did. That's okay given that the two extant call sites are guaranteed to pass detoasted datums. But using VARDATA_ANY gives a completely misleading appearance of being ready to deal with short-header varlenas, and heaven forbid there should be any comment to discourage future coders from trying. So really what I'd like to see here is /* remember start of datum for maxalign reference */ raw = (char *) data; /* alignment logic assumes full-size datum header */ Assert(VARATT_IS_4B(data)); /* pointer to the data part (skip the varlena header) */ ptr = VARDATA_ANY(data); Or, of course, this could all go away if we got rid of the bogus maxaligning... regards, tom lane