Re: Watching for view changes
On Fri, 21 Dec 2018 23:41:16 -0800, Mitar wrote: >Hi! > >On Fri, Dec 21, 2018 at 11:10 PM George Neuner wrote: >> A materialized view IS exactly such a deliberate cache of results from >> applying a view. It is a real table that can be monitored for changes >> using INSERT, UPDATE and/or DELETE triggers. > >Caching is needed if you want to compute a difference between previous >version and new. But if you want to just know new value, then I could >imagine that (a simple implementation would) on every change to any >underlying table check if this change matches selectors of the query >and if such apply its operations/projections and produce the new >value. Yes, that could be done. But it isn't. In effect you are asking the DBMS also to be a spreadsheet: i.e. change this cell and everything that depends on it gets recomputed. A spreadsheet is an order of magnitude simpler to implement than a DBMS, but the combination would be an order of magnitude (or more) harder. Even object graph databases don't offer spreadsheet functionality, and it would be a lot easier to do there than in a table relational system. >So yes, you need caching if you want to decrease CPU use, but you >could also see it as new values being computed again and again through >query. Would such caching you are mentioning really improve >performance, I do not know, so it might be premature optimization? It may take only 3 cycles to multiply two numbers, but it can take thousands of cycles [or millions if the data is on disk] to get those two numbers into the multiplier. There always are exceptions, but the general rule is that whenever the result requires: - significant computation, - significant resources, or - significant time then you should cache the result instead of recomputing it. Joins and sorts can take a whole lot of memory (and spill onto disk if they overflow the work buffer). A fetch of a table or index not in memory is simple but takes a lot of time - as well as maybe pushing something else out (increasing the complexity of a competing query). >If we do go down the cache path, then I agree, materialized views >seems nice, but I would prefer temporary materialized views: they >should be cleaned up at the end of the session. Moreover, they should >be ideally just in memory, not really on disk. Materialized views are >currently stored to disk, no? In PG, all *named* tables are backed on disk - even temporary tables. Only anonymous tables of query results can exist entirely in memory [and even they can spill onto disk when necessary]. With enough memory you can cache all your tables in shared buffers and have enough extra that you never run out of work buffers and never overflow a work buffer. But that is the best you can achieve with PG. George
Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2
Hi Mark, It's look very weird. Can you try something like this (check that you have enough disk space for second copy of common_student before): create table test_table AS SELECT * from common_student; Vacuum analyze test_table; explain analyze select * from test_table where school_id = 36; drop table test_table; create table test_table AS SELECT * from common_student ORDER BY school_id; Vacuum analyze test_table; explain analyze select * from test_table where school_id = 36; drop table test_table; And provide results of both explain analyze queries. On Sat, Dec 22, 2018 at 3:39 AM Mark wrote: > Hi All, > > I'm seeing some very slow queries and it looks like the query planner is > deciding to do a 'Nested Loop Left Join' > which is slow. When I SET enable_nestloop=OFF for testing it does a 'Hash > Left Join' which is much faster. > > I think I can see the cause of the problem in the examples below. I can't > make sense of the statistics being written > by ANALYSE or the planners row estimates. > > Can anyone help me understand > > - Why the row estimate I get in the query below for school_id = 36 is 1 ? >( I expect it to be higher) > - Why does '1' appear in the most_common_vals when it is actually the > least common value. > - Why doesn't 36 appear in the most_common_vals (it is more common than 1) > - Does the analyse output below mean that it only scanned 51538 of 65463 > rows in the table? Is school_id 36 just being missed in the sample? (This > happens when the analyse is repeated ) > > Any help with understanding what's happening here would be much > appreciated. > > I hope I've provided enough information below. > > Thanks, > > Mark > > db=> explain analyse select * from common_student where school_id = 36 ; > QUERY > PLAN > > > Index Scan using idx_common_student_sid on common_student > (cost=0.41..8.39 rows=1 width=385) (actual time=0.264..1.691 rows=1388 > loops=1) >Index Cond: (school_id = 36) > Planning time: 0.087 ms > Execution time: 2.706 ms > (4 rows) > > db=> select tablename,attname,most_common_vals,histogram_bounds,n_distinct > from pg_stats where attname='school_id' and tablename='common_stude > nt'; >tablename| attname | > most_common_vals > | histogram_bounds | n_distinct > > +---++--+ > common_student | school_id | > {79,60,25,61,59,69,86,77,64,33,56,78,58,81,41,97,22,67,38,23,3,72,92,93,48,24,96,26,75,90,70,52,51,21,14,91,83,54,85,11,68,94,53,88,1} > | | 45 > (1 row) > > db=> select count(distinct(school_id)) from common_student ; > count > --- > 55 > (1 row) > > db=> alter table common_student alter column school_id set statistics > 10; > WARNING: lowering statistics target to 1 > ALTER TABLE > db=> analyse verbose common_student(school_id); > INFO: analyzing "public.common_student" > INFO: "common_student": scanned 7322 of 7322 pages, containing 65463 live > rows and 49026 dead rows; 51538 rows in sample, 65463 estimated total rows > ANALYZE > db=> select tablename,attname,most_common_vals,histogram_bounds,n_distinct > from pg_stats where attname='school_id' and tablename='common_stude > nt'; >tablename| attname | > most_common_vals > | histogram_bounds | n_distinct > > +---++--+ > common_student | school_id | > {79,60,25,61,59,69,86,77,64,33,56,78,58,81,41,97,22,67,38,23,3,72,92,93,48,24,96,26,75,90,70,52,51,21,14,91,83,54,85,11,68,94,53,88,1} > | | 45 > (1 row) > > db=> explain analyse select * from common_student where school_id = 36 ; > QUERY > PLAN > > > Index Scan using idx_common_student_sid on common_student > (cost=0.41..8.39 rows=1 width=385) (actual time=0.542..4.022 rows=1388 > loops=1) >Index Cond: (school_id = 36) > Planning time: 0.334 ms > Execution time: 6.542 ms > (4 rows) > > db=> select school_id, count(*) from common_student group by school_id > order by count(*) limit 6 ; > school_id | count > ---+--- > 1 |50 > 88 | 161 > 53 | 252 > 94 | 422 > 31 | 434 > 68 | 454 > (6 rows) > > dvpjxbzc=> select school_id, count(*) from common_st
logical replication resiliency
I have a database with about 1 TB of data, mostly inserts, little updates and deletes, and I want to setup 2 logical replication for it. What should be my expectation with the replication? 1. What is the time-window that either of the nodes can be offline? - Will it survive if the node doing the publishing is offline for a few hours and then goes back online? - Will it survive if the subscriber node is offline for a few hours and then goes back online? 2. If the replication is aborted for whatever reason, is there a way to do it incrementally, or will it re-sync the 1TB at every such event? 3. What alternative tools are out there to e.g. periodically sync from one db to another? Thanks, Istvan
Re: Watching for view changes
Hi! On Sun, Dec 23, 2018 at 1:00 AM George Neuner wrote: > A spreadsheet is an order of magnitude simpler to implement than a > DBMS, but the combination would be an order of magnitude (or more) > harder. Even object graph databases don't offer spreadsheet > functionality, and it would be a lot easier to do there than in a > table relational system. But having that readily available would be so cool. :-) Especially because it is hard. And also, it does not necessary have to be full graph. Just one level of dependencies. Then those are recomputed. And then anything depending on now changed values get recomputed again. And so on. So no need to traverse the whole graph at once. > In PG, all *named* tables are backed on disk - even temporary tables. > Only anonymous tables of query results can exist entirely in memory > [and even they can spill onto disk when necessary]. > > With enough memory you can cache all your tables in shared buffers and > have enough extra that you never run out of work buffers and never > overflow a work buffer. But that is the best you can achieve with PG. Thanks for all this input. So I am now thinking in terms of a materialized views. For my case it would be great if materialized views could be TEMPORARY (removed at the end of session), in-memory (and if PostgreSQL runs out of buffer space for it, I would prefer an error, instead of spilling to the disk), and UNLOGGED. Any particular reasons which would prevent them to be implemented as such? I through that materialized views are just a fancy table with stored query, so that you can easily REFRESH them, instead of doing that yourself. I can then wire triggers on underlying tables to REFRESH materialized views automatically. Is there some easy way to debounce those refreshes? If I just blindly trigger REFRESH in every row trigger, this could do a lot of work. I would prefer to combine all changes for example into 100 ms batches and refresh only once per 100 ms. So in some way I would like to be able to control the level of real-time I would need. I have found a blog post [1] about this, but it seems pretty tricky and requires an outside cron job. For 100 ms batching time this feels like something better done inside PostgreSQL itself. The last question is how do I get changes in materialized views streamed to the client. It seems one option is a trigger on the materialized view which uses NOTIFY to tell the client about the change. But NOTIFY has limit on the payload size, so I cannot just send the change to the client. It seems I would need additional table to store the change and then client would get notification, read from that table, and remove the rows read. So in a way I would need my own queue for changes. Any other suggestion how to do that? I looked into PUBLICATION and SUBSCRIBE, but it seems this is only supported between servers, not server-client, and also works only on base tables, not materialized views (not sure again why, because aren't materialized views just tables). Would it be possible to use client to subscribe instead of a server? [1] https://onethingsimple.com/2017/10/sync-materialized-views-after-debounce-period/ Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
NOTIFY/LISTEN with ODBC interface
Hi, Is there a way to write a code found at https://www.postgresql.org/docs/9.1/libpq-example.html Example 2 with the ODBC interface? Thank you.
Determine in a trigger if UPDATE query really changed anything
Hi! Currently I am doing: CREATE TRIGGER some_trigger AFTER UPDATE ON my_table REFERENCING NEW TABLE AS new_table OLD TABLE AS old_table FOR EACH STATEMENT EXECUTE FUNCTION my_trigger(); In my trigger I do: PERFORM * FROM ((TABLE new_table EXCEPT TABLE new_table) UNION ALL (TABLE new_table EXCEPT TABLE old_table)) AS differences LIMIT 1; IF FOUND THEN ... But I wonder if there is an easier way. I would just like to know if an UPDATE really changed anything. For DELETE I do "PERFORM * FROM old_table LIMIT 1" and for INSERT I do "PERFORM * FROM new_table LIMIT 1" and I think this is reasonable. Still, not sure why I have to store the whole relation just to know if statement really changed anything. Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: Watching for view changes
Hi! On Fri, Dec 21, 2018 at 11:10 PM George Neuner wrote: > A materialized view IS exactly such a deliberate cache of results from > applying a view. It is a real table that can be monitored for changes > using INSERT, UPDATE and/or DELETE triggers. Are you sure one can use triggers on a materialized view? I am getting: "my_materialized_view" is not a table or view as an error when I am trying to create a trigger on materialized view. Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: Watching for view changes
On Sun, 23 Dec 2018 23:06:51 -0800, Mitar wrote: >On Fri, Dec 21, 2018 at 11:10 PM George Neuner wrote: >> A materialized view IS exactly such a deliberate cache of results from >> applying a view. It is a real table that can be monitored for changes >> using INSERT, UPDATE and/or DELETE triggers. > >Are you sure one can use triggers on a materialized view? I am getting: > >"my_materialized_view" is not a table or view > >as an error when I am trying to create a trigger on materialized view. IIRC the cache table's name is generated. I don't know the proper incantations to get it from the catalogs, but an easy way to find it is to put the materialized view into its own tablespace, then search pg_tables for objects in that space. George