RE: Partial aggregates pushdown

fujii.y...@df.mitsubishielectric.co.jp Sun, 07 Jul 2024 14:46:55 -0700

Hi Jelte and hackers,

I've reconsidered which of the following two approaches is the best.
  Approach1: Adding export/import functions to transmit state values.
  Approach 2: Adding native types which are equal to state values.


In my mind, Approach1 is superior. Therefore, if there are no objections this 
week, I plan to resume implementing Approach1 next week. I would appreciate it 
if anyone could discuss the topic with me or ask questions.

I believe that while Approach1 has the extendability to support situations 
where local and remote major versions differ, Approach2 lacks this 
extendability. Additionally, it seems that Approach1 requires fewer additional 
lines of code compared to Approach2. I'm also concerned that Approach2 may 
cause the catalog pg_type to bloat.

Although Approach2 offers the benefit of avoiding the addition of columns to 
pg_aggregate, I think this benefit is smaller than the advantages of Approach1 
mentioned above.

Next, I will present my complete comparison. The comparison points are as 
follows:
  1. Extendability
  2. Amount of codes
  3. Catalog size
  4. Developer burden
  5. Additional columns to catalogs

1. Extendability
I believe it is crucial to support scenarios where the local and remote major 
versions may differ in the future (see the below).

https://www.postgresql.org/message-id/4012625.1701120204%40sss.pgh.pa.us

Regarding this aspect, I consider Approach1 superior to Approach2. The reason 
is that:
・The data type of an aggregate function's state value may change with each 
major version increment.
・In Approach1, by extending the export/import functionalities to include the 
major version in which the state value was created (refer to p.16 and p.17 of 
[1]), I can handle such situations.
・On the other hand, it appears that Approach2 fundamentally lacks the 
capability to support these scenarios.

2. Amount of codes
Regarding this aspect, I find Approach1 to be better than Approach2.
In Approach1, developers only need to export/import functions and can use a 
standardized format for transmitting state values.
In Approach2, developers have two options:
  Option1: Adding typinput/typoutput and typsend/typreceive.
  Option2: Adding typinput/typoutput only.
Option1 requires more lines of code, which may be seen as cumbersome by some 
developers.
Option2 restricts developers to using only text representation for transmitting 
state values, which I consider limiting.

3. Catalog size
Regarding this point, I believe Approach1 is better than Approach2.
In Approach1, theoretically, it is necessary to add export/import functions to 
pg_proc for each aggregate.
In Approach2, theoretically, it is necessary to add typoutput/typinput 
functions (and typsend/typreceive if necessary) to pg_proc and add a native 
type to pg_type for each aggregate.
I would like to emphasize that we should consider user-defined functions in 
addition to built-in aggregate functions.
I think most developers prefer to avoid bloating catalogs, even if they may not 
be able to specify exact reasons.
In fact, in Robert's previous review, he expressed a similar concern (see 
below).

https://www.postgresql.org/message-id/CA%2BTgmobvja%2Bjytj5zcEcYgqzOaeJiqrrJxgqDf1q%3D3k8FepuWQ%40mail.gmail.com

4. Developer burden.
Regarding this aspect, I believe Approach1 is better than Approach2.
In Approach1, developers have the following additional tasks:
  Task1-1: Create and define export/import functions.

In Approach2, developers have the following additional tasks:
  Task2-1: Create and define typoutput/input functions (and typesend/typreceive 
functions if necessary).
  Task2-2: Define a native type.

Approach1 requires fewer additional tasks, although the difference may be not 
substantial.

5. Additional columns to catalogs.
Regarding this aspect, Approach2 is better than Approach1.
Approach1 requires additional three columns in pg_aggregate, specifically the 
aggpartialpushdownsafe flag, export function reference, and import function 
reference.
Approach2 does not require any additional columns in catalogs.
However, over the past four years of discussions, no one has expressed concerns 
about additional columns in catalogs.

[1] 
https://www.postgresql.org/message-id/attachment/160659/PGConfDev2024_Presentation_Aggregation_Scaleout_FDW_Sharding_20240531.pdf

Best regards, Yuki Fujii
--
Yuki Fujii
Information Technology R&D Center, Mitsubishi Electric Corporation

RE: Partial aggregates pushdown

Reply via email to