Hi, On 11/21/2017 03:47 PM, Ildus Kurbangaliev wrote: > On Mon, 20 Nov 2017 00:04:53 +0100 > Tomas Vondra <tomas.von...@2ndquadrant.com> wrote: > > ... > >> 6) I'm rather confused by AttributeCompression vs. >> ColumnCompression. I mean, attribute==column, right? Of course, one >> is for data from parser, the other one is for internal info. But >> can we make the naming clearer? > > For now I have renamed AttributeCompression to CompressionOptions, > not sure that's a good name but at least it gives less confusion. >
I propose to use either CompressionMethodOptions (and CompressionMethodRoutine) or CompressionOptions (and CompressionRoutine) >> >> 7) The docs in general are somewhat unsatisfactory, TBH. For example >> the ColumnCompression has no comments, unlike everything else in >> parsenodes. Similarly for the SGML docs - I suggest to expand them to >> resemble FDW docs >> (https://www.postgresql.org/docs/10/static/fdwhandler.html) which >> also follows the handler/routines pattern. > > I've added more comments. I think I'll add more documentation if the > committers will approve current syntax. > OK. Haven't reviewed this yet. >> >> 8) One of the unclear things if why we even need 'drop' routing. It >> seems that if it's defined DropAttributeCompression does something. >> But what should it do? I suppose dropping the options should be done >> using dependencies (just like we drop columns in this case). >> >> BTW why does DropAttributeCompression mess with att->attisdropped in >> this way? That seems a bit odd. > > 'drop' routine could be useful. An extension could do something > related with the attribute, like remove extra tables or something > else. The compression options will not be removed after unlinking > compression method from a column because there is still be stored > compressed data in that column. > OK. So something like a "global" dictionary used for the column, or something like that? Sure, seems useful and I've been thinking about that, but I think we badly need some extension using that, even if in a very simple way. Firstly, we need a "how to" example, secondly we need some way to test it. >> >> 13) When writing the experimental extension, I was extremely >> confused about the regular varlena headers, custom compression >> headers, etc. In the end I stole the code from tsvector.c and >> whacked it a bit until it worked, but I wouldn't dare to claim I >> understand how it works. >> >> This needs to be documented somewhere. For example postgres.h has >> a bunch of paragraphs about varlena headers, so perhaps it should >> be there? I see the patch tweaks some of the constants, but does >> not update the comment at all. > > This point is good, I'm not sure how this documentation should look > like. I've just assumed that people should have deep undestanding of > varlenas if they're going to compress them. But now it's easy to > make mistake there. Maybe I should add some functions that help to > construct varlena, with different headers. I like the way is how > jsonb is constructed. It uses StringInfo and there are few helper > functions (reserveFromBuffer, appendToBuffer and others). Maybe they > should be not static. > Not sure. My main problem was not understanding how this affects the varlena header, etc. And I had no idea where to look. >> >> Perhaps it would be useful to provide some additional macros >> making access to custom-compressed varlena values easier. Or >> perhaps the VARSIZE_ANY / VARSIZE_ANY_EXHDR / VARDATA_ANY already >> support that? This part is not very clear to me. > > These macros will work, custom compressed varlenas behave like old > compressed varlenas. > OK. But then I don't understand why tsvector.c does things like VARSIZE(data) - VARHDRSZ_CUSTOM_COMPRESSED - arrsize VARRAWSIZE_4B_C(data) - arrsize instead of VARSIZE_ANY_EXHDR(data) - arrsize VARSIZE_ANY(data) - arrsize Seems somewhat confusing. >>> Still it's a problem if the user used for example `SELECT >>> <compressed_column> INTO * FROM *` because postgres will copy >>> compressed tuples, and there will not be any dependencies >>> between destination and the options. >>> >> >> This seems like a rather fatal design flaw, though. I'd say we need >> to force recompression of the data, in such cases. Otherwise all >> the dependency tracking is rather pointless. > > Fixed this problem too. I've added recompression for datum that use > custom compression. > Hmmm, it still doesn't work for me. See this: test=# create extension pg_lz4 ; CREATE EXTENSION test=# create table t_lz4 (v text compressed lz4); CREATE TABLE test=# create table t_pglz (v text); CREATE TABLE test=# insert into t_lz4 select repeat(md5(1::text),300); INSERT 0 1 test=# insert into t_pglz select * from t_lz4; INSERT 0 1 test=# drop extension pg_lz4 cascade; NOTICE: drop cascades to 2 other objects DETAIL: drop cascades to compression options for lz4 drop cascades to table t_lz4 column v DROP EXTENSION test=# \c test You are now connected to database "test" as user "user". test=# insert into t_lz4 select repeat(md5(1::text),300);^C test=# select * from t_pglz ; ERROR: cache lookup failed for compression options 16419 That suggests no recompression happened. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services