Re: Delta Materialized View Refreshes?

2018-11-04 Thread denty
Hi folks,

I had a crack at this, and it was pretty simple to get something working to
play around with, and it seems like it might be useful.

I developed it against 10.1, as that's what I happened to be working with at
the time. The patch is pretty small, and I hoped it would apply cleanly
against 11. Unfortunately it doesn't, but I doubt the issues are
substantial. If there is interest in moving this forward, I'll update and
re-share.

The patch enables pretty much exactly what Jeremy suggests — something like
"refresh materialized view concurrently testview where type = 'main';" —
with fairly obvious semantics.

Welcome comments on the patch or approach.

denty.

(Seems I can't attach via the web interface, so copy/paste patch below.)





--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html



Re: Delta Materialized View Refreshes?

2018-11-10 Thread denty
Hi folks,

I've shared a new patch against 11.0, which seems to work as expected.
(Message ID 5100c2b3-641b-4a35-86d0-12ed2e618...@qqdd.eu.)

While playing with it, it is actually quite easy to get it confused. And so
I wonder — is it actually what we want?

For example, if I refresh including a WHERE that filters /out/ some content
presently in the MV, but filters /in/ some new content relating to those
same rows, then we predictably get a fail. Using the following example MV
MV, 'testview', AS SELECT test.type, test.message, count(1) AS count FROM
test GROUP BY test.type, test.message, then a refresh materialized view
concurrently testview where type = 'main' and count>2 hits:

ERROR:  duplicate key value violates unique constraint
"testview_type_message_idx"
DETAIL:  Key (type, message)=(main, hello world) already exists.
CONTEXT:  SQL statement "INSERT INTO public.testview SELECT (diff.newdata).*
FROM pg_temp_3.pg_temp_16390_2 diff WHERE tid IS NULL"

The message can probably be cleaned up. But the root cause is clearly in the
fact that REFRESH ... WHERE really needs to be used quite carefully.

I mused about restricting the WHERE clause Vars to allow reference only to
columns that are part of the MV's UNIQUE index. It seems it would prevent
the issue arising in my simple example, but is it always necessary? And
would it be overly restrictive? (For example: would it prevent people
issuing delta refreshes and including clauses that make the refresh
performant — because perhaps it helps the planner see a short cut to speedy
execution?)

On a different topic, in implementing it, I noticed that there is
rudimentary code-level support for incremental refreshes (see
Open/CloseMatViewIncrementalMaintenance() and
MatViewIncrementalMaintenanceIsEnabled()), but the facility is not
hook-able. There's another discussion (Flexible permissions for REFRESH
MATERIALIZED VIEW), and I wonder if a more interesting feature would be to
either allow the incremental refresh barriers to be hooked by extensions, or
even to offer a fine-grained permission that allows direct manipulation of
data in the MV's underlying table.

Far as I can see, allowing extensions to hook the incremental refresh APIs
would be trivial. Exposing the same via a fine-grained permission would
certainly be much harder but it might enable advanced delta-refresh
strategies to emerge that are written in high level languages such as
PL/pgSQL or Java (etc.) — that is certainly desirable.

denty.



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html



Re: Implementing Incremental View Maintenance

2018-12-31 Thread denty
Hi Yugo.

> I would like to implement Incremental View Maintenance (IVM) on
> PostgreSQL.

Great. :-)

I think it would address an important gap in PostgreSQL’s feature set.

> 2. How to compute the delta to be applied to materialized views
> 
> Essentially, IVM is based on relational algebra. Theorically, changes on
> base
> tables are represented as deltas on this, like "R <- R + dR", and the
> delta on
> the materialized view is computed using base table deltas based on "change
> propagation equations".  For implementation, we have to derive the
> equation from
> the view definition query (Query tree, or Plan tree?) and describe this as
> SQL
> query to compulte delta to be applied to the materialized view.

We had a similar discussion in this thread
https://www.postgresql.org/message-id/flat/FC784A9F-F599-4DCC-A45D-DBF6FA582D30%40QQdd.eu,
and I’m very much in agreement that the "change propagation equations”
approach can solve for a very substantial subset of common MV use cases.

> There could be several operations for view definition: selection,
> projection, 
> join,  aggregation, union, difference, intersection, etc.  If we can
> prepare a
> module for each operation, it makes IVM extensable, so we can start a
> simple 
> view definition, and then support more complex views.

Such a decomposition also allows ’stacking’, allowing complex MV definitions
to be attacked even with only a small handful of modules.

I did a bit of an experiment to see if "change propagation equations” could
be computed directly from the MV’s pg_node_tree representation in the
catalog in PlPgSQL. I found that pg_node_trees are not particularly friendly
to manipulation in PlPgSQL. Even with a more friendly-to-PlPgSQL
representation (I played with JSONB), then the next problem is making sense
of the structures, and unfortunately amongst the many plan/path/tree utility
functions in the code base, I figured only a very few could be sensibly
exposed to PlPgSQL. Ultimately, although I’m still attracted to the idea,
and I think it could be made to work, native code is the way to go at least
for now.

> 4. When to maintain materialized views
> 
> [...]
> 
> In the previous discussion[4], it is planned to start from "eager"
> approach. In our PoC
> implementaion, we used the other aproach, that is, using REFRESH command
> to perform IVM.
> I am not sure which is better as a start point, but I begin to think that
> the eager
> approach may be more simple since we don't have to maintain base table
> changes in other
> past transactions.

Certainly the eager approach allows progress to be made with less
infrastructure.

I am concerned that the eager approach only addresses a subset of the MV use
case space, though. For example, if we presume that an MV is present because
the underlying direct query would be non-performant, then we have to at
least question whether applying the delta-update would also be detrimental
to some use cases.

In the eager maintenance approache, we have to consider a race condition
where two
different transactions change base tables simultaneously as discussed in
[4].

I wonder if that nudges towards a logged approach. If the race is due to
fact of JOIN-worthy tuples been made visible after a COMMIT, but not before,
then does it not follow that the eager approach has to fire some kind of
reconciliation work at COMMIT time? That seems to imply a persistent queue
of some kind, since we can’t assume transactions to be so small to be able
to hold the queue in memory.

Hmm. I hadn’t really thought about that particular corner case. I guess a
‘catch' could be simply be to detect such a concurrent update and demote the
refresh approach by marking the MV stale awaiting a full refresh.

denty.



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html



Re: Delta Materialized View Refreshes?

2018-09-15 Thread denty
The idea of allowing a WHERE clause to be appended to REFRESH MATERIALIZED
VIEW seems useful.

It would enable those that know well the pattern of data modification in
their underlying use case to schedule delta-updates (say, from crontab).

And also it would be a useful as a foundation for more ambitious ideas. For
example, some folk can/wish to do something entirely custom that is
trigger-based. Others such as Nico's have toolkits that intend to cover
quite a lot of the automagical refresh cases (see this other thread
https://www.postgresql-archive.org/Query-Rewrite-for-Materialized-Views-FDW-Extension-td6015235.html),
and it allows the data to remain in an MVIEW to boot, rather than forcing
them to use normal tables instead. It could also be the foundation for
something much more integrated such as Corey's CONTINUOUS MATERIALIZED VIEW
concept.

So I'm certainly supportive of the idea.



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html