GitHub user my-ship-it closed a discussion: Enable answer query using
Materialized View for external table.
### Description
Currently, for inner table, planner knows to use MV to speed up query, for
example:
```
create table tbl_heap(c1 int, c2 int);
insert into tbl_heap select generate_series(1,1000000);
create materialized view mv as select c1, c2 from tbl_heap where c1 > 10 and c2
< 20;
analyze mv;
set optimizer = off;
set enable_answer_query_using_materialized_views = on;
postgres=# explain select c1, c2 from tbl_heap where c1 > 10 and c2 < 20;
QUERY PLAN
----------------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..1.03 rows=1 width=8)
-> Seq Scan on mv (cost=0.00..1.01 rows=1 width=8)
Optimizer: Postgres query optimizer
(3 rows)
```
But for external table, it doesn't work
```
create external table tbl_ext(c1 int, c2 int)
location('gpfdist://localhost:8080/tbl_ext') format 'csv';
create materialized view mv_ext as select c1, c2 from tbl_ext where c1 > 10 and
c2 < 20;
analyze mv_ext;
set optimizer = off;
set enable_answer_query_using_materialized_views = on;
postgres=# explain select c1, c2 from tbl_ext where c1 > 10 and c2 < 20;
QUERY PLAN
----------------------------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..246940.11 rows=17320508
width=8)
-> Foreign Scan on tbl_ext (cost=0.00..16000.00 rows=5773503 width=8)
Filter: ((c1 > 10) AND (c2 < 20))
Optimizer: Postgres query optimizer
(4 rows)
```
The reason why the replacement for external doesn't work is that, if external
table changes, planner doesn't know that, and replacement of MV would produce
wrong results.
But for some cases, access to external table(for example, Iceberg, Hudi, and
etc) is much slower than inner table(maybe 10X slower or more). If user knows
there is no data changes for external table, we could speed up query against
external table, which is a big win in this scenario. Another approach is that
user load external table into inner table for analysis, but there is latency,
and loading process is also time consuming.
How about we introduce a GUC to control the behavior? If the user is 100% sure
that the external table data has not been modified, they could turn on the GUC
and speed up query.
### Use case/motivation
_No response_
### Related issues
_No response_
### Are you willing to submit a PR?
- [ ] Yes I am willing to submit a PR!
GitHub link: https://github.com/apache/cloudberry/discussions/693
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]