Re: [HACKERS] Custom compression methods

Dilip Kumar Thu, 08 Oct 2020 02:09:25 -0700

On Wed, Oct 7, 2020 at 5:00 PM Dilip Kumar <[email protected]> wrote:
>
> On Wed, Oct 7, 2020 at 10:26 AM Dilip Kumar <[email protected]> wrote:
> >
> > On Tue, Oct 6, 2020 at 10:21 PM Tomas Vondra
> > <[email protected]> wrote:
> > >
> > > On Tue, Oct 06, 2020 at 11:00:55AM +0530, Dilip Kumar wrote:
> > > >On Mon, Oct 5, 2020 at 9:34 PM Tomas Vondra
> > > ><[email protected]> wrote:
> > > >>
> > > >> On Mon, Oct 05, 2020 at 07:57:41PM +0530, Dilip Kumar wrote:
> > > >> >On Mon, Oct 5, 2020 at 5:53 PM Tomas Vondra
> > > >> ><[email protected]> wrote:
> > > >> >>
> > > >> >> On Mon, Oct 05, 2020 at 11:17:28AM +0530, Dilip Kumar wrote:
> > > >> >> >On Mon, Oct 5, 2020 at 3:37 AM Tomas Vondra
> > > >> >> ><[email protected]> wrote:
> > > >> >> >
> > > >> >> >Thanks, Tomas for your feedback.
> > > >> >> >
> > > >> >> >> 9) attcompression ...
> > > >> >> >>
> > > >> >> >> The main issue I see is what the patch does with attcompression. 
> > > >> >> >> Instead
> > > >> >> >> of just using it to store a the compression method, it's also 
> > > >> >> >> used to
> > > >> >> >> store the preserved compression methods. And using NameData to 
> > > >> >> >> store
> > > >> >> >> this seems wrong too - if we really want to store this info, the 
> > > >> >> >> correct
> > > >> >> >> way is either using text[] or inventing charvector or similar.
> > > >> >> >
> > > >> >> >The reason for using the NameData is the get it in the fixed part 
> > > >> >> >of
> > > >> >> >the data structure.
> > > >> >> >
> > > >> >>
> > > >> >> Why do we need that? It's possible to have varlena fields with 
> > > >> >> direct
> > > >> >> access (see pg_index.indkey for example).
> > > >> >
> > > >> >I see.  While making it NameData I was thinking whether we have an
> > > >> >option to direct access the varlena. Thanks for pointing me there. I
> > > >> >will change this.
> > > >> >
> > > >> > Adding NameData just to make
> > > >> >> it fixed-length means we're always adding 64B even if we just need a
> > > >> >> single byte, which means ~30% overhead for the 
> > > >> >> FormData_pg_attribute.
> > > >> >> That seems a bit unnecessary, and might be an issue with many 
> > > >> >> attributes
> > > >> >> (e.g. with many temp tables, etc.).
> > > >> >
> > > >> >You are right.  Even I did not like to keep 64B for this, so I will 
> > > >> >change it.
> > > >> >
> > > >> >>
> > > >> >> >> But to me this seems very much like a misuse of attcompression 
> > > >> >> >> to track
> > > >> >> >> dependencies on compression methods, necessary because we don't 
> > > >> >> >> have a
> > > >> >> >> separate catalog listing compression methods. If we had that, I 
> > > >> >> >> think we
> > > >> >> >> could simply add dependencies between attributes and that 
> > > >> >> >> catalog.
> > > >> >> >
> > > >> >> >Basically, up to this patch, we are having only built-in 
> > > >> >> >compression
> > > >> >> >methods and those can not be dropped so we don't need any 
> > > >> >> >dependency
> > > >> >> >at all.  We just want to know what is the current compression 
> > > >> >> >method
> > > >> >> >and what is the preserve compression methods supported for this
> > > >> >> >attribute.  Maybe we can do it better instead of using the NameData
> > > >> >> >but I don't think it makes sense to add a separate catalog?
> > > >> >> >
> > > >> >>
> > > >> >> Sure, I understand what the goal was - all I'm saying is that it 
> > > >> >> looks
> > > >> >> very much like a workaround needed because we don't have the 
> > > >> >> catalog.
> > > >> >>
> > > >> >> I don't quite understand how could we support custom compression 
> > > >> >> methods
> > > >> >> without listing them in some sort of catalog?
> > > >> >
> > > >> >Yeah for supporting custom compression we need some catalog.
> > > >> >
> > > >> >> >> Moreover, having the catalog would allow adding compression 
> > > >> >> >> methods
> > > >> >> >> (from extensions etc) instead of just having a list of hard-coded
> > > >> >> >> compression methods. Which seems like a strange limitation, 
> > > >> >> >> considering
> > > >> >> >> this thread is called "custom compression methods".
> > > >> >> >
> > > >> >> >I think I forgot to mention while submitting the previous patch 
> > > >> >> >that
> > > >> >> >the next patch I am planning to submit is, Support creating the 
> > > >> >> >custom
> > > >> >> >compression methods wherein we can use pg_am catalog to insert the 
> > > >> >> >new
> > > >> >> >compression method.  And for dependency handling, we can create an
> > > >> >> >attribute dependency on the pg_am row.  Basically, we will create 
> > > >> >> >the
> > > >> >> >attribute dependency on the current compression method AM as well 
> > > >> >> >as
> > > >> >> >on the preserved compression methods AM.   As part of this, we will
> > > >> >> >add two build-in AMs for zlib and pglz, and the attcompression 
> > > >> >> >field
> > > >> >> >will be converted to the oid_vector (first OID will be of the 
> > > >> >> >current
> > > >> >> >compression method, followed by the preserved compression method's
> > > >> >> >oids).
> > > >> >> >
> > > >> >>
> > > >> >> Hmmm, ok. Not sure pg_am is the right place - compression methods 
> > > >> >> don't
> > > >> >> quite match what I though AMs are about, but maybe it's just my 
> > > >> >> fault.
> > > >> >>
> > > >> >> FWIW it seems a bit strange to first do the attcompression magic and
> > > >> >> then add the catalog later - I think we should start with the 
> > > >> >> catalog
> > > >> >> right away. The advantage is that if we end up committing only some 
> > > >> >> of
> > > >> >> the patches in this cycle, we already have all the infrastructure 
> > > >> >> etc.
> > > >> >> We can reorder that later, though.
> > > >> >
> > > >> >Hmm,  yeah we can do this way as well that first create a new catalog
> > > >> >table and add entries for these two built-in methods and the
> > > >> >attcompression can store the oid vector.  But if we only commit the
> > > >> >build-in compression methods part then does it make sense to create an
> > > >> >extra catalog or adding these build-in methods to the existing catalog
> > > >> >(if we plan to use pg_am).  Then in attcompression instead of using
> > > >> >one byte for each preserve compression method, we need to use oid.  So
> > > >> >from Robert's mail[1], it appeared to me that he wants that the
> > > >> >build-in compression methods part should be independently committable
> > > >> >and if we think from that perspective then adding a catalog doesn't
> > > >> >make much sense.  But if we are planning to commit the custom method
> > > >> >also then it makes more sense to directly start with the catalog
> > > >> >because that way it will be easy to expand without much refactoring.
> > > >> >
> > > >> >[1] 
> > > >> >https://www.postgresql.org/message-id/CA%2BTgmobSDVgUage9qQ5P_%3DF_9jaMkCgyKxUQGtFQU7oN4kX-AA%40mail.gmail.com
> > > >> >
> > > >>
> > > >> Hmmm. Maybe I'm missing something subtle, but I think that plan can be
> > > >> interpreted in various ways - it does not really say whether the 
> > > >> initial
> > > >> list of built-in methods should be in some C array, or already in a 
> > > >> proper
> > > >> catalog.
> > > >>
> > > >> All I'm saying is it seems a bit weird to first implement dependencies
> > > >> based on strange (mis)use of attcompression attribute, and then replace
> > > >> it with a proper catalog. My understanding is those patches are 
> > > >> expected
> > > >> to be committable one by one, but the attcompression approach seems a
> > > >> bit too hacky to me - not sure I'd want to commit that ...
> > > >
> > > >Okay, I will change this.  So I will make create a new catalog
> > > >pg_compression and add the entry for two built-in compression methods
> > > >from the very first patch.
> > > >
> > >
> > > OK.
>
> I have changed the first 2 patches, basically, now we are providing a
> new catalog pg_compression and the pg_attribute is storing the oid of
> the compression method.  The patches still need some cleanup and there
> is also one open comment that for index we should use its table
> compression.
>
> I am still working on the preserve patch.  For preserving the
> compression method I am planning to convert the attcompression field
> to the oidvector so that we can store the oid of the preserve method
> also.  I am not sure whether we can access this oidvector as a fixed
> part of the FormData_pg_attribute or not.  The reason is that for
> building the tuple descriptor, we need to give the size of the fixed
> part (#define ATTRIBUTE_FIXED_PART_SIZE \
> (offsetof(FormData_pg_attribute,attcompression) + sizeof(Oid))).  But
> if we convert this to the oidvector then we don't know the size of the
> fixed part.  Am I missing something?


I could think of two solutions here
Sol1.
Make the first oid of the oidvector as part of the fixed size, like below
#define ATTRIBUTE_FIXED_PART_SIZE \
(offsetof(FormData_pg_attribute, attcompression) + OidVectorSize(1))

Sol2:
Keep attcompression as oid only and for the preserve list, adds
another field in the variable part which will be of type oidvector.  I
think most of the time we need to access the current compression
method and with this solution, we will be able to access that as part
of the tuple desc.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Custom compression methods

Reply via email to