On Wed, Feb 3, 2021 at 2:07 AM Robert Haas <robertmh...@gmail.com> wrote: > > Even more review comments, still looking mostly at 0001: > > If there's a reason why parallel_schedule is arranging to run the > compression test in parallel with nothing else, the comment in that > file should explain the reason. If there isn't, it should be added to > a parallel group that doesn't have the maximum number of tests yet, > probably the last such group in the file. > > serial_schedule should add the test in a position that roughly > corresponds to where it appears in parallel_schedule. > > I believe it's relatively standard practice to put variable > declarations at the top of the file. compress_lz4.c and > compress_pglz.c instead put those declarations nearer to the point of > use. > > compressamapi.c has an awful lot of #include directives for the code > it actually contains. I believe that we should cut that down to what > is required by 0001, and other patches can add more later as required. > In fact, it's tempting to just get rid of this .c file altogether and > make the two functions it contains static inline functions in the > header, but I'm not 100% sure that's a good idea. > > The copyright dates in a number of the file headers are out of date. > > binary_upgrade_next_pg_am_oid and the related changes to > CreateAccessMethod don't belong in 0001, because it doesn't support > non-built-in compression methods. These changes and the related > pg_dump change should be moved to the patch that adds support for > that. > > The comments added to dumpTableSchema() say that "compression is > assigned by ALTER" but don't give a reason. I think they should. I > don't know how much they need to explain about what the code does, but > they definitely need to explain why it does it. Also, isn't this bad? > If we create the column with the wrong compression setting initially > and then ALTER it, we have to rewrite the table. If it's empty, that's > cheap, but it'd still be better not to do it at all. > > I'm not sure it's a good idea for dumpTableSchema() to leave out > specifying the compression method if it happens to be pglz. I think we > definitely shouldn't do it in binary-upgrade mode. What if we changed > the default in a future release? For that matter, even 0002 could make > the current approach unsafe.... I think, anyway. > > The changes to pg_dump.h look like they haven't had a visit from > pgindent. You should probably try to do that for the whole patch, > though it's a bit annoying since you'll have to manually remove > unrelated changes to the same files that are being modified by the > patch. Also, why the extra blank line here? > > GetAttributeCompression() is hard to understand. I suggest changing > the comment to "resolve column compression specification to an OID" > and somehow rejigger the code so that you aren't using one not-NULL > test and one NULL test on the same variable. Like maybe change the > first part to if (!IsStorageCompressible(typstorage)) { if > (compression == NULL) return InvalidOid; ereport(ERROR, ...); } > > It puzzles me that CompareCompressionMethodAndDecompress() calls > slot_getallattrs() just before clearing the slot. It seems like this > ought to happen before we loop over the attributes, so that we don't > need to call slot_getattr() every time. See the comment for that > function. But even if we didn't do that for some reason, why would we > do it here? If it's already been done, it shouldn't do anything, and > if it hasn't been done, it might overwrite some of the values we just > poked into tts_values. It also seems suspicious that we can get away > with clearing the slot and then again marking it valid. I'm not sure > it really works like that. Like, can't clearing the slot invalidate > pointers stored in tts_values[]? For instance, if they are pointers > into an in-memory heap tuple, tts_heap_clear() is going to free the > tuple; if they are pointers into a buffer, tts_buffer_heap_clear() is > going to unpin it. I think the supported procedure for this sort of > thing is to have a second slot, set tts_values, tts_isnull etc. and > then materialize the slot. After materializing the new slot, it's > independent of the old slot, which can then be cleared. See for > example tts_virtual_materialize(). The whole approach you've taken > here might need to be rethought a bit. I think you are right to want > to avoid copying everything over into a new slot if nothing needs to > be done, and I think we should definitely keep that optimization, but > I think if you need to copy stuff, you have to do the above procedure > and then continue using the other slot instead of the original one. > Some places I think we have functions that return either the original > slot or a different one depending on how it goes; that might be a > useful idea here. But, you also can't just spam-create slots; it's > important that whatever ones we end up with get reused for every > tuple. > > Doesn't the change to describeOneTableDetails() require declaring > changing the declaration of char *headers[11] to char *headers[12]? > How does this not fail Assert(cols <= lengthof(headers))? > > Why does describeOneTableDetais() arrange to truncate the printed > value? We don't seem to do that for the other column properties, and > it's not like this one is particularly long. > > Perhaps the changes to pg_am.dat shouldn't remove the blank line? > > I think the comment to pg_attribute.h could be rephrased to stay > something like: "OID of compression AM. Must be InvalidOid if and only > if typstorage is 'a' or 'b'," replacing 'a' and 'b' with whatever the > right letters are. This would be shorter and I think also clearer than > what you have > > The first comment change in postgres.h is wrong. You changed > va_extsize to "size in va_extinfo" but the associated structure > definition is unchanged, so the comment shouldn't be changed either. > > In toast_internals.h, you end using 30 as a constant several times but > have no #define for it. You do have a #define for RAWSIZEMASK, but > that's really a derived value from 30. Also, it's not a great name > because it's kind of generic. So how about something like: > > #define TOAST_RAWSIZE_BITS 30 > #define TOAST_RAWSIZE_MASK ((1 << (TOAST_RAW_SIZE_BITS + 1)) - 1) > > But then again on second thought, this 30 seems to be the same 30 that > shows up in the changes to postgres.h, and there again 0x3FFFFFFF > shows up too. So maybe we should actually be defining these constants > there, using names like VARLENA_RAWSIZE_BITS and VARLENA_RAWSIZE_MASK > and then having toast_internals.h use those constants as well. > > Taken with the email I sent yesterday, I think this is a more or less > complete review of 0001. Although there are a bunch of things to fix > here still, I don't think this is that far from being committable. I > don't at this point see too much in terms of big design problems. > Probably the CompareCompressionMethodAndDecompress() is the closest to > a design-level problem, and certainly something needs to be done about > it, but even that is a fairly localized problem in the context of the > entire patch.
Thanks, Robert for the detailed review. I will work on these comments and post the updated patch. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com