date:20181223

Re: Watching for view changes

2018-12-23 Thread George Neuner

On Fri, 21 Dec 2018 23:41:16 -0800, Mitar  wrote:

>Hi!
>
>On Fri, Dec 21, 2018 at 11:10 PM George Neuner  wrote:
>> A materialized view IS exactly such a deliberate cache of results from
>> applying a view.  It is a real table that can be monitored for changes
>> using INSERT, UPDATE and/or DELETE triggers.
>
>Caching is needed if you want to compute a difference between previous
>version and new. But if you want to just know new value, then I could
>imagine that (a simple implementation would) on every change to any
>underlying table check if this change matches selectors of the query
>and if such apply its operations/projections and produce the new
>value.

Yes, that could be done.  But it isn't.  In effect you are asking the
DBMS also to be a spreadsheet: i.e. change this cell and everything
that depends on it gets recomputed.

A spreadsheet is an order of magnitude simpler to implement than a
DBMS, but the combination would be an order of magnitude (or more)
harder.  Even object graph databases don't offer spreadsheet
functionality, and it would be a lot easier to do there than in a
table relational system.

>So yes, you need caching if you want to decrease CPU use, but you
>could also see it as new values being computed again and again through
>query. Would such caching you are mentioning really improve
>performance, I do not know, so it might be premature optimization?

It may take only 3 cycles to multiply two numbers, but it can take
thousands of cycles [or millions if the data is on disk] to get those
two numbers into the multiplier.

There always are exceptions, but the general rule is that whenever the
result requires:
 - significant computation, 
 - significant resources, or
 - significant time
then you should cache the result instead of recomputing it.

Joins and sorts can take a whole lot of memory (and spill onto disk if
they overflow the work buffer).  A fetch of a table or index not in
memory is simple but takes a lot of time - as well as maybe pushing
something else out (increasing the complexity of a competing query).

>If we do go down the cache path, then I agree, materialized views
>seems nice, but I would prefer temporary materialized views: they
>should be cleaned up at the end of the session. Moreover, they should
>be ideally just in memory, not really on disk. Materialized views are
>currently stored to disk, no?

In PG, all *named* tables are backed on disk - even temporary tables.
Only anonymous tables of query results can exist entirely in memory
[and even they can spill onto disk when necessary].

With enough memory you can cache all your tables in shared buffers and
have enough extra that you never run out of work buffers and never
overflow a work buffer.  But that is the best you can achieve with PG.

George

Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2

2018-12-23 Thread Maxim Boguk

Hi Mark,

It's look very weird.
Can you try something like this (check that you have enough disk space for
second copy of common_student before):

create table test_table AS SELECT * from common_student;
Vacuum analyze test_table;
explain analyze select * from test_table where school_id = 36;
drop table test_table;
create table test_table AS SELECT * from common_student ORDER BY school_id;
Vacuum analyze test_table;
explain analyze select * from test_table where school_id = 36;
drop table test_table;

And provide results of both explain analyze queries.



On Sat, Dec 22, 2018 at 3:39 AM Mark  wrote:

> Hi All,
>
> I'm seeing some very slow queries and it looks like the query planner is
> deciding to do a 'Nested Loop Left Join'
> which is slow.  When I SET enable_nestloop=OFF for testing it does a 'Hash
> Left Join' which is much faster.
>
> I think  I can see the cause of the problem in the examples below. I can't
> make sense of the statistics being written
> by ANALYSE or the planners row estimates.
>
> Can anyone help me understand
>
> - Why the row estimate I get in the query below for school_id = 36 is 1 ?
>( I expect it to be higher)
> - Why does '1' appear in the most_common_vals when it is actually the
> least common value.
> - Why doesn't 36 appear in the most_common_vals (it is more common than 1)
> - Does the analyse output below mean that it only scanned 51538 of 65463
> rows in the table? Is school_id 36 just being missed in the sample? (This
> happens when the analyse is repeated )
>
> Any help with understanding what's happening here would be much
> appreciated.
>
> I hope I've provided enough information below.
>
> Thanks,
>
> Mark
>
> db=> explain analyse select * from common_student  where school_id = 36 ;
>  QUERY
> PLAN
>
> 
>  Index Scan using idx_common_student_sid on common_student
> (cost=0.41..8.39 rows=1 width=385) (actual time=0.264..1.691 rows=1388
> loops=1)
>Index Cond: (school_id = 36)
>  Planning time: 0.087 ms
>  Execution time: 2.706 ms
> (4 rows)
>
> db=> select tablename,attname,most_common_vals,histogram_bounds,n_distinct
> from pg_stats where attname='school_id'  and tablename='common_stude
> nt';
>tablename|  attname  |
>   most_common_vals
>   | histogram_bounds | n_distinct
>
> +---++--+
>  common_student | school_id |
> {79,60,25,61,59,69,86,77,64,33,56,78,58,81,41,97,22,67,38,23,3,72,92,93,48,24,96,26,75,90,70,52,51,21,14,91,83,54,85,11,68,94,53,88,1}
> |  | 45
> (1 row)
>
> db=> select count(distinct(school_id)) from common_student ;
>  count
> ---
> 55
> (1 row)
>
> db=> alter table common_student alter column school_id set statistics
> 10;
> WARNING:  lowering statistics target to 1
> ALTER TABLE
> db=> analyse verbose common_student(school_id);
> INFO:  analyzing "public.common_student"
> INFO:  "common_student": scanned 7322 of 7322 pages, containing 65463 live
> rows and 49026 dead rows; 51538 rows in sample, 65463 estimated total rows
> ANALYZE
> db=> select tablename,attname,most_common_vals,histogram_bounds,n_distinct
> from pg_stats where attname='school_id'  and tablename='common_stude
> nt';
>tablename|  attname  |
>   most_common_vals
>   | histogram_bounds | n_distinct
>
> +---++--+
>  common_student | school_id |
> {79,60,25,61,59,69,86,77,64,33,56,78,58,81,41,97,22,67,38,23,3,72,92,93,48,24,96,26,75,90,70,52,51,21,14,91,83,54,85,11,68,94,53,88,1}
> |  | 45
> (1 row)
>
> db=> explain analyse select * from common_student  where school_id = 36 ;
>  QUERY
> PLAN
>
> 
>  Index Scan using idx_common_student_sid on common_student
> (cost=0.41..8.39 rows=1 width=385) (actual time=0.542..4.022 rows=1388
> loops=1)
>Index Cond: (school_id = 36)
>  Planning time: 0.334 ms
>  Execution time: 6.542 ms
> (4 rows)
>
> db=> select school_id, count(*) from common_student  group by school_id
> order by count(*)  limit 6 ;
>  school_id | count
> ---+---
>  1 |50
> 88 |   161
> 53 |   252
> 94 |   422
> 31 |   434
> 68 |   454
> (6 rows)
>
> dvpjxbzc=> select school_id, count(*) from common_st

logical replication resiliency

2018-12-23 Thread Istvan Soos

I have a database with about 1 TB of data, mostly inserts, little
updates and deletes, and I want to setup 2 logical replication for it.
What should be my expectation with the replication?

1. What is the time-window that either of the nodes can be offline?
- Will it survive if the node doing the publishing is offline for a
few hours and then goes back online?
- Will it survive if the subscriber node is offline for a few hours
and then goes back online?

2. If the replication is aborted for whatever reason, is there a way
to do it incrementally, or will it re-sync the 1TB at every such
event?

3. What alternative tools are out there to e.g. periodically sync from
one db to another?

Thanks,
  Istvan

Re: Watching for view changes

2018-12-23 Thread Mitar

Hi!

On Sun, Dec 23, 2018 at 1:00 AM George Neuner  wrote:
> A spreadsheet is an order of magnitude simpler to implement than a
> DBMS, but the combination would be an order of magnitude (or more)
> harder.  Even object graph databases don't offer spreadsheet
> functionality, and it would be a lot easier to do there than in a
> table relational system.

But having that readily available would be so cool. :-) Especially
because it is hard. And also, it does not necessary have to be full
graph. Just one level of dependencies. Then those are recomputed. And
then anything depending on now changed values get recomputed again.
And so on. So no need to traverse the whole graph at once.

> In PG, all *named* tables are backed on disk - even temporary tables.
> Only anonymous tables of query results can exist entirely in memory
> [and even they can spill onto disk when necessary].
>
> With enough memory you can cache all your tables in shared buffers and
> have enough extra that you never run out of work buffers and never
> overflow a work buffer.  But that is the best you can achieve with PG.

Thanks for all this input. So I am now thinking in terms of a
materialized views.

For my case it would be great if materialized views could be TEMPORARY
(removed at the end of session), in-memory (and if PostgreSQL runs out
of buffer space for it, I would prefer an error, instead of spilling
to the disk), and UNLOGGED. Any particular reasons which would prevent
them to be implemented as such? I through that materialized views are
just a fancy table with stored query, so that you can easily REFRESH
them, instead of doing that yourself.

I can then wire triggers on underlying tables to REFRESH materialized
views automatically. Is there some easy way to debounce those
refreshes? If I just blindly trigger REFRESH in every row trigger,
this could do a lot of work. I would prefer to combine all changes for
example into 100 ms batches and refresh only once per 100 ms. So in
some way I would like to be able to control the level of real-time I
would need. I have found a blog post [1] about this, but it seems
pretty tricky and requires an outside cron job. For 100 ms batching
time this feels like something better done inside PostgreSQL itself.

The last question is how do I get changes in materialized views
streamed to the client. It seems one option is a trigger on the
materialized view which uses NOTIFY to tell the client about the
change. But NOTIFY has limit on the payload size, so I cannot just
send the change to the client. It seems I would need additional table
to store the change and then client would get notification, read from
that table, and remove the rows read. So in a way I would need my own
queue for changes. Any other suggestion how to do that? I looked into
PUBLICATION and SUBSCRIBE, but it seems this is only supported between
servers, not server-client, and also works only on base tables, not
materialized views (not sure again why, because aren't materialized
views just tables). Would it be possible to use client to subscribe
instead of a server?

[1] 
https://onethingsimple.com/2017/10/sync-materialized-views-after-debounce-period/

Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m

NOTIFY/LISTEN with ODBC interface

2018-12-23 Thread Igor Korot

Hi,
Is there a way to write a code found at
https://www.postgresql.org/docs/9.1/libpq-example.html
Example 2 with the ODBC interface?

Thank you.

Determine in a trigger if UPDATE query really changed anything

2018-12-23 Thread Mitar

Hi!

Currently I am doing:

CREATE TRIGGER some_trigger AFTER UPDATE ON my_table REFERENCING NEW
TABLE AS new_table OLD TABLE AS old_table FOR EACH STATEMENT EXECUTE
FUNCTION my_trigger();

In my trigger I do:

PERFORM * FROM ((TABLE new_table EXCEPT TABLE new_table) UNION ALL
(TABLE new_table EXCEPT TABLE old_table)) AS differences LIMIT 1;
IF FOUND THEN
  ...

But I wonder if there is an easier way. I would just like to know if
an UPDATE really changed anything.

For DELETE I do "PERFORM * FROM old_table LIMIT 1"  and for INSERT I
do "PERFORM * FROM new_table LIMIT 1" and I think this is reasonable.
Still, not sure why I have to store the whole relation just to know if
statement really changed anything.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m

Re: Watching for view changes

2018-12-23 Thread Mitar

Hi!

On Fri, Dec 21, 2018 at 11:10 PM George Neuner  wrote:
> A materialized view IS exactly such a deliberate cache of results from
> applying a view.  It is a real table that can be monitored for changes
> using INSERT, UPDATE and/or DELETE triggers.

Are you sure one can use triggers on a materialized view? I am getting:

"my_materialized_view" is not a table or view

as an error when I am trying to create a trigger on materialized view.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m

Re: Watching for view changes

2018-12-23 Thread George Neuner

On Sun, 23 Dec 2018 23:06:51 -0800, Mitar  wrote:

>On Fri, Dec 21, 2018 at 11:10 PM George Neuner  wrote:
>> A materialized view IS exactly such a deliberate cache of results from
>> applying a view.  It is a real table that can be monitored for changes
>> using INSERT, UPDATE and/or DELETE triggers.
>
>Are you sure one can use triggers on a materialized view? I am getting:
>
>"my_materialized_view" is not a table or view
>
>as an error when I am trying to create a trigger on materialized view.

IIRC the cache table's name is generated.  I don't know the proper
incantations to get it from the catalogs, but an easy way to find it
is to put the materialized view into its own tablespace, then search
pg_tables for objects in that space.

George

Re: Watching for view changes

Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2

logical replication resiliency

Re: Watching for view changes

NOTIFY/LISTEN with ODBC interface

Determine in a trigger if UPDATE query really changed anything

Re: Watching for view changes

Re: Watching for view changes

8 matches

Site Navigation

Mail list logo

Footer information