On Mon, Mar 22, 2021 at 10:41 AM Tom Lane <t...@sss.pgh.pa.us> wrote: > > Okay, the fix makes sense. In fact, IMHO, in general also this fix > > looks like an optimization, I mean when slicelength >= > > VARRAWSIZE_4B_C(value), then why do we need to allocate extra memory > > even in the case of pglz. So shall we put this check directly in > > toast_decompress_datum_slice instead of handling it at the lz4 level? > > Yeah, I thought about that too, but do we want to assume that > VARRAWSIZE_4B_C is the correct way to get the decompressed size > for all compression methods?
I think it's OK to assume this. If and when we add a third compression method, it seems certain to just grab one of the two remaining bit patterns. Now, things get a bit more complicated if and when we want to add a fourth method, because at that point you've got to ask yourself how comfortable you feel about stealing the last bit pattern for your feature. But, if the solution to that problem were to decide that whenever that last bit pattern is used, we will add an extra byte (or word) after va_tcinfo indicating the real compression method, then using VARRAWSIZE_4B_C here would still be correct. To imagine this decision being wrong, you have to posit a world in which one of the two remaining bit patterns for the high 2 bits cause the low 30 bits to be interpreted as something other than the size, which I guess is not totally impossible, but my first reaction is to think that such a design would be (1) hard to make work and (2) unnecessarily painful. > (If so, I think it would be better style to have a less opaque macro > name for the purpose.) Complaining about the name of one particular TOAST-related macro name seems a bit like complaining about the greenhouse gasses emitted by one particular car. They're pretty uniformly terrible. Does anyone really know when to use VARATT_IS_1B_E or VARATT_IS_4B_U or any of that cruft? Like, who decided that "is this varatt 1B E?" would be a perfectly reasonable way of asking "is this varlena is TOAST pointer?". While I'm complaining, it's hard to say enough bad things about the fact that we have 12 consecutive completely obscure macro definitions for which the only comments are (a) that they are endian-dependent - which isn't even true for all of them - and (b) that they are "considered internal." Apparently, they're SO internal that they don't even need to be understandable to other developers. Anyway, this particular macro name was chosen, it seems, for symmetry with VARDATA_4B_C, but if you want to change it to something else, I'm OK with that, too. -- Robert Haas EDB: http://www.enterprisedb.com