On Wed, Sep 2, 2020 at 4:57 AM Mark Dilger <mark.dil...@enterprisedb.com> wrote: > > > > > On Aug 13, 2020, at 4:48 AM, Dilip Kumar <dilipbal...@gmail.com> wrote: > > > > v1-0001: As suggested by Robert, it provides the syntax support for > > setting the compression method for a column while creating a table and > > adding columns. However, we don't support changing the compression > > method for the existing column. As part of this patch, there is only > > one built-in compression method that can be set (pglz). In this, we > > have one in-build am (pglz) and the compressed attributes will directly > > store the oid of the AM. In this patch, I have removed the > > pg_attr_compresion as we don't support changing the compression > > for the existing column so we don't need to preserve the old > > compressions. > > I do not like the way pglz compression is handled in this patch. After > upgrading PostgreSQL to the first version with this patch included, > pre-existing on-disk compressed data will not include any custom compression > Oid in the header, and toast_decompress_datum will notice that and decompress > the data directly using pglz_decompress. If the same data were then written > back out, perhaps to another table, into a column with no custom compression > method defined, it will get compressed by toast_compress_datum using > DefaultCompressionOid, which is defined as PGLZ_COMPRESSION_AM_OID. That > isn't a proper round-trip for the data, as when it gets re-compressed, the > PGLZ_COMPRESSION_AM_OID gets written into the header, which makes the data a > bit longer, but also means that it is not byte-for-byte the same as it was, > which is counter-intuitive. Given that any given pglz compressed datum now > has two totally different formats that might occur on disk, code may have to > consider both of them, which increases code complexity, and regression tests > will need to be written with coverage for both of them, which increases test > complexity. It's also not easy to write the extra tests, as there isn't any > way (that I see) to intentionally write out the traditional shorter form from > a newer database server; you'd have to do something like a pg_upgrade test > where you install an older server to write the older format, upgrade, and > then check that the new server can handle it. > > The cleanest solution to this would seem to be removal of the compression > am's Oid from the header for all compression ams, so that pre-patch written > data and post-patch written data look exactly the same. The other solution > is to give pglz pride-of-place as the original compression mechanism, and > just say that when pglz is the compression method, no Oid gets written to the > header, and only when other compression methods are used does the Oid get > written. This second option seems closer to the implementation that you > already have, because you already handle the decompression of data where the > Oid is lacking, so all you have to do is intentionally not write the Oid when > compressing using pglz. > > Or did I misunderstand your implementation?
Thanks for looking into it. Actually, I am planning to change this patch such that we will use the upper 2 bits of the size field instead of storing the amoid for the builtin compression methods. e. g. 0x00 = pglz, 0x01 = zlib, 0x10 = other built-in, 0x11 -> custom compression method. And when 0x11 is set then only we will store the amoid in the toast header. I think after a week or two I will make these changes and post my updated patch. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com