Hi.
fujii.y...@df.mitsubishielectric.co.jp писал(а) 2024-02-22 10:20:
Hi. Mr.Haas, hackers.
I apologize for the significant delay since my last post.
I have conducted investigations and considerations regarding the
remaining tasks as follows.
Would it be possible for you to review them?
In particular, could you please confirm if the approach mentioned in 1.
is acceptable?
If there are no issues with the direction outlined in 1., I plan to
make a simple prototype based on this approach.
1. Transmitting state value safely between machines
From: Robert Haas <robertmh...@gmail.com>
Sent: Wednesday, December 6, 2023 10:25 PM
the problems around transmitting
serialized bytea blobs between machines that can't be assumed to fully
trust each other will need to be addressed in some
way, which seems like it will require a good deal of design work,
forming some kind of consensus, and then implementation
work to follow.
I have considered methods for safely transmitting state values between
different machines.
I have taken into account the version policy of PostgreSQL (5 years of
support) and the major version release cycle over the past 10 years (1
year), and as a result, I have made the assumption that transmission is
allowed only when the difference between the local version and the
remote version is 5 or less.
I believe that by adding new components, "export function" and "import
function", to the aggregate functions, and further introducing a new
SQL keyword to the query syntax of aggregate expressions, we can
address this issue.
If the version of the local server is higher than or equal to the
version of the remote server, the proposed method can be simplified.
The export version mentioned later in (1) would not be necessary.
Furthermore, if the version of the local server matches the version of
the remote server, the proposed method can be further simplified.
I would appreciate your input on reasonable assumptions regarding the
differences in versions between the local server and the remote server.
I will explain the specifications of the export function, import
function, the new SQL keyword for aggregate expressions, and the
behavior of query processing for partial aggregation separately.
(1) Export Function Specification
This function is another final function for partial aggregate.
This function converts the state value that represents the result of
partial aggregation into a format that can be read by the local server.
This function is called instead of the existing finalfunc during the
final stage of aggregation when performing partial aggregation.
The conversion process described above will be referred to as "export".
The argument of an export function is the version of the server that
will receive the return value.
Hereafter, this version will be referred to as the export version.
The concept of an export version is necessary to handle cases where the
version of the local server is smaller than the version of the remote
server.
The return value of the export function is the transformed state value,
and its data type is bytea.
For backward compatibility, the developer of the export function must
ensure that the export can be performed for major versions up to five
versions prior to the major version of PostgreSQL that the export
function is being developed for.
For built-in functions, I believe it is necessary to allow for the
possibility of not developing the export functionality for specific
versions in the future (due to reasons such as development burden)
after the export function is developed for a certain version.
To achieve this, for built-in functions, we will add a column to the
pg_aggregate catalog that indicates the presence or absence of export
functionality for each major version, including the major version being
developed and the previous five major versions. This column will be
named safety_export_versions and will have a data type of boolean[6].
For user-defined functions, we will refer to the extensions option and
add an external server option called safety_export_extensions, which
will maintain a list of extensions that include only the aggregate
functions that can be exported to the local server version.
...
I honestly think that achieving cross-version compatibility in this way
puts a significant burden on developers. Can we instead always use the
more or less universal export and import function to fix possible issues
with binary representations on different architectures and just refuse
to push down partial aggregates on server version mismatch? At least at
the first step?
3. Fixing the behavior when the HAVING clause is present
From: Robert Haas <robertmh...@gmail.com>
Sent: Tuesday, November 28, 2023 4:08 AM
On Wed, Nov 22, 2023 at 1:32 AM Alexander Pyhalov
<a.pyha...@postgrespro.ru> wrote:
> Hi. HAVING is also a problem. Consider the following query
>
> SELECT count(a) FROM t HAVING count(a) > 10 - we can't push it down to
> foreign server as HAVING needs full aggregate result, but foreign
> server don't know it.
I don't see it that way. What we would push to the foreign server
would be something like SELECT count(a) FROM t. Then,
after we get the results back and combine the various partial counts
locally, we would locally evaluate the HAVING clause
afterward. That is, partial aggregation is a barrier to pushing down
HAVING clause itself, but it doesn't preclude pushing
down the aggregation.
I have made modifications in the attached patch to ensure that when the
HAVING clause is present, the HAVING clause is executed locally while
the partial aggregations are pushed down.
Sorry, I don't see how it works. When we have partial aggregates and
having clause, foreign_grouping_ok() returns false and
add_foreign_grouping_paths() adds no paths.
I'm not saying it's necessary to fix this in the first patch version.
explain verbose select sum(a) from pagg_tab having sum(a)>10;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=2282.49..2282.50 rows=1 width=8)
Output: sum(pagg_tab.a)
Filter: (sum(pagg_tab.a) > 10)
-> Append (cost=760.81..2282.48 rows=3 width=8)
-> Partial Aggregate (cost=760.81..760.82 rows=1 width=8)
Output: PARTIAL sum(pagg_tab.a)
-> Foreign Scan on public.fpagg_tab_p1 pagg_tab
(cost=100.00..753.50 rows=2925 width=4)
Output: pagg_tab.a
Remote SQL: SELECT a FROM public.pagg_tab_p1
-> Partial Aggregate (cost=760.81..760.82 rows=1 width=8)
Output: PARTIAL sum(pagg_tab_1.a)
-> Foreign Scan on public.fpagg_tab_p2 pagg_tab_1
(cost=100.00..753.50 rows=2925 width=4)
Output: pagg_tab_1.a
Remote SQL: SELECT a FROM public.pagg_tab_p2
-> Partial Aggregate (cost=760.81..760.82 rows=1 width=8)
Output: PARTIAL sum(pagg_tab_2.a)
-> Foreign Scan on public.fpagg_tab_p3 pagg_tab_2
(cost=100.00..753.50 rows=2925 width=4)
Output: pagg_tab_2.a
Remote SQL: SELECT a FROM public.pagg_tab_p3
Also I have some minor notices on the code.
contrib/postgres_fdw/deparse.c: comment before appendFunctionName() has
gone, this seems to be wrong.
In finalize_aggregate()
1079 /*
1080 * Apply the agg's finalfn if one is provided, else return
transValue.
1081 */
Comment should be updated to note behavior for agg_partial aggregates.
1129 else if (peragg->aggref->agg_partial
1130 && (peragg->aggref->aggtranstype ==
INTERNALOID)
1131 && OidIsValid(peragg->serialfn_oid))
In this if branch, should we check just for peragg->aggref->agg_partial
and peragg->aggref->aggtranstype == INTERNALOID? It seems that if
peragg->aggref->aggtranstype == INTERNALOID and there's no
serialfn_oid, it's likely an error (and one should be generated).
Overall patch seems nicer. Will look at it more this week.
--
Best regards,
Alexander Pyhalov,
Postgres Professional