Logical Replication - Type messages?

2021-11-09 Thread Stefen Hillman
Hello psql-general,


I have an application which processes logical replication output from a
PostgreSQL server. We create publications and read data from a replication
slot with the pgoutput plugin.

Postgres has various message formats for logical replication, as defined
here:
https://www.postgresql.org/docs/10/protocol-logicalrep-message-formats.html

Currently, I process Relation messages to get most of the information I
need to work with Inserts, Updates, and Deletes. However, for type
information I'm currently using a SQL query to get the column type
information. I wanted to use the Type messages, but I never see them. I see
Begin, Commit, etc. but never Type.

   - When are the Type messages generated and sent by the server?
   - Is there a way to cause them to be generated?
   - Are these type id > type mappings set in stone, or does PostgreSQL
   allow for a server to have custom types (I notice that namespace is one of
   the properties sent with the types)?

Any help is greatly appreciated - thank you!


Best regards,


Stefen Hillman


Query Timeout not working using jpa

2021-11-09 Thread Shubham Mittal
Hi Team,

Can anyone help in setting the query timeout in postgres 12 using spring
data jpa??

Currently using below code :

query.setHint("javax.persistence.query.timeout", 1);

Thanks & Regards,
Shubham


Logical Replication - Should Destination Table Columns Be Defined With Default Value

2021-11-09 Thread Avi Weinberg
Hi Experts,

I'm using logical replication and have the following open issue:
If the table on the publisher side has column with default value, should the 
table on the destination (subscriber) be defined with default value as well?

My intuition was not to define it with default value since it gets its values 
from publisher, but then I face the following issue that default value given to 
existing data when a column is added is not copied to the destination side.

on source/publisher side 

create table table1(id int primary key, a int);
insert into table1 select 1,1;
alter table table1 add column b int not null default 1;
select * from table1
output id,a, b:
1  1  1



 on destination/subscriber side ***
create table table1(id int primary key, a int);
select * from table1
alter table table1 add column b int;  -- I purposely defined it without default 
value
ALTER SUBSCRIPTION aaa REFRESH PUBLICATION
select * from table1
output id,a, b:
1  1  null
Why the (default) value given to column b for existing row is not synced to the 
subscriber.  Of course, for new rows the default value is copied to subscriber 
table.

Was this done on purpose, that default value for new column is not copied for 
existing data?  Does this mean that on destination side we must also define the 
table with default value?


If instead of the default on the publisher side I do the following it works and 
the value is copied to the subscriber.

do $$
begin
alter table table1 add column b int;
update table1 set b = 1;
ALTER TABLE table1 ALTER COLUMN b SET NOT NULL;
end $$;


IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.


Re: Logical Replication - Should Destination Table Columns Be Defined With Default Value

2021-11-09 Thread David G. Johnston
On Tue, Nov 9, 2021 at 7:50 AM Avi Weinberg  wrote:

> Was this done on purpose, that default value for new column is not copied
> for existing data?  Does this mean that on destination side we must also
> define the table with default value?
>

Logical replication has two modes, initial synchronization and ongoing
change push.  The ongoing change push sends entire rows, when they change,
from the publisher to subscriber.

The initial sync happens once, when the subscriber initially subscribes to
the publication.

As entire rows are getting sent, defaults on the subscriber are immaterial
so far as the published rows are concerned.

If you run a command on the publisher that causes every row to change then
of course every row will be published with those new values to the
subscriber.

David J.


RE: Logical Replication - Should Destination Table Columns Be Defined With Default Value

2021-11-09 Thread Avi Weinberg
Thanks David for the reply.

I also thought that default value on the subscriber side are immaterial.  
However, with the case I showed without having default value on subscriber side 
it get null when the following occurs:

  1.  Table was created with two columns on publisher and subscriber side
  2.  Data inserted into this table
  3.  A third column is added to table with default value on publisher side, 
but without default value on subscriber side
  4.  The default value column has value for existing rows on publisher, but 
null on the subscriber side.
  5.  Doing refresh publication etc. does not help and the column on subscriber 
side remains with nulls


Your input is most welcome


From: David G. Johnston [mailto:david.g.johns...@gmail.com]
Sent: Tuesday, November 9, 2021 5:55 PM
To: Avi Weinberg 
Cc: pgsql-general@lists.postgresql.org
Subject: Re: Logical Replication - Should Destination Table Columns Be Defined 
With Default Value

On Tue, Nov 9, 2021 at 7:50 AM Avi Weinberg 
mailto:a...@gilat.com>> wrote:
Was this done on purpose, that default value for new column is not copied for 
existing data?  Does this mean that on destination side we must also define the 
table with default value?

Logical replication has two modes, initial synchronization and ongoing change 
push.  The ongoing change push sends entire rows, when they change, from the 
publisher to subscriber.

The initial sync happens once, when the subscriber initially subscribes to the 
publication.

As entire rows are getting sent, defaults on the subscriber are immaterial so 
far as the published rows are concerned.

If you run a command on the publisher that causes every row to change then of 
course every row will be published with those new values to the subscriber.

David J.

IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.


Re: Logical Replication - Should Destination Table Columns Be Defined With Default Value

2021-11-09 Thread Tom Lane
"David G. Johnston"  writes:
> On Tue, Nov 9, 2021 at 7:50 AM Avi Weinberg  wrote:
>> Was this done on purpose, that default value for new column is not copied
>> for existing data?  Does this mean that on destination side we must also
>> define the table with default value?

> If you run a command on the publisher that causes every row to change then
> of course every row will be published with those new values to the
> subscriber.

I think the important point here is that the ALTER ADD COLUMN operation
*didn't* cause a physical update of every row on the publisher, thus
nothing got sent to the subscriber.

This is sort of annoying, because it is making what ought to be a purely
internal optimization user-visible.  Depending on the details of the
column default (is it null, is it a constant) and which PG version you
are talking about, there may or may not be a rewrite of the source table,
and here we see that that is semantically visible to subscribers.

I don't want to give up the aforesaid optimization --- it's a feature that
you can do common forms of ALTER ADD COLUMN in O(1) time.  But maybe we
ought to document the implications for logical replication better.

regards, tom lane




Re: Logical Replication - Should Destination Table Columns Be Defined With Default Value

2021-11-09 Thread Mark Dilger



> On Nov 9, 2021, at 8:02 AM, Avi Weinberg  wrote:
> 
>   • A third column is added to table with default value on publisher 
> side, but without default value on subscriber side
>   • The default value column has value for existing rows on publisher, 
> but null on the subscriber side. 

See https://www.postgresql.org/docs/14/ddl-alter.html#DDL-ALTER-ADDING-A-COLUMN

The important part is this TIP:

From PostgreSQL 11, adding a column with a constant default value no longer 
means that each row of the table needs to be updated when the ALTER TABLE 
statement is executed. Instead, the default value will be returned the next 
time the row is accessed, and applied when the table is rewritten, making the 
ALTER TABLE very fast even on large tables.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Logical Replication - Should Destination Table Columns Be Defined With Default Value

2021-11-09 Thread Michael Lewis
Is there any advantage to not defining the default on the replica? If it is
not a static value and the publishing database will trigger row updates, I
could see waiting to set the default until after the table re-write is
done, but otherwise there doesn't seem to be any benefit to skipping column
defaults on subscribers.


Re: Logical Replication - Should Destination Table Columns Be Defined With Default Value

2021-11-09 Thread Mark Dilger



> On Nov 9, 2021, at 1:24 PM, Michael Lewis  wrote:
> 
> Is there any advantage to not defining the default on the replica? If it is 
> not a static value and the publishing database will trigger row updates, I 
> could see waiting to set the default until after the table re-write is done, 
> but otherwise there doesn't seem to be any benefit to skipping column 
> defaults on subscribers.

That's a schema design decision.  I can't really make recommendations in the 
abstract.  As a general rule, I would expect that the DDL executed on the 
publisher (such as the ALTER TABLE..ADD COLUMN..DEFAULT..) would be the same as 
that executed on the subscriber, unless there is a particular reason to want 
different behavior on the subscriber.

Wanting different behavior is not unreasonable.  For example, if the subscriber 
exists merely to archive data from the publisher, the subscriber might not 
bother creating indexes over that data, even if such indexes exist on the 
publisher. 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Understanding the differences between Temporal tables, CDC and Time Series.

2021-11-09 Thread SQL Padawan
Good morning to everyone,

I'm looking into Temporal Tables (TT - i.e. System/Application time versioning) 
but I would like to understand the differences between TTs and two other 
features (which appear similar in many ways) - Change Data Capture and Time 
Series.

Are Temporal Tables (in a sense) a query framework framework around CDC?

And if that's the case, what then are Time Series - which are specifically 
designed to host (time) changing data?

Are Temporal Tables (again, in a sense) Time Series databases which only record 
changes in time(-stamps)?

I'm unclear as to the "philosophical" distinctions here and would be grateful 
if anybody could explain the diffence(s) between them?

I did look at this myself and found these posts on StackOverflow, but I don't 
feel as if they've fundamentally explained the difference(s).

https://stackoverflow.com/questions/39165412/why-we-require-temporal-table-in-sql-server-2016-as-we-have-cdc-or-ct

https://stackoverflow.com/questions/800331/why-do-we-need-a-temporal-database

Any ideas, discussion, references, URLs welcome.

Thx,

SQLP!

Sent with [ProtonMail](https://protonmail.com/) Secure Email.

E.1.2. Migration to Version 14

2021-11-09 Thread Дмитрий Иванов
Hello!
E.1.2 Migration to Version 14

 array_append() ,
array_prepend(), array_cat(), array_position(), array_positions(),
array_remove(), array_replace(), and width_bucket()


Is this a complete list of functions? Or you need to remove all functions:
Table 9.52. Array Functions


Do the functions need to be deleted and restored, or can they be recreated?
I have a lot of them and they are called from other functions:

Uchet=# SELECT COUNT(pg_proc.proname) AS "COUNT"
Uchet-#FROM pg_proc
Uchet-#   WHERE pg_proc.prosrc ~~ '%array%'::text AND pg_proc.pronamespace
= 16443::oid;
 COUNT
---
89
(1 row)
Trying to estimate labor costs and migration opportunities.
Thanks!


Re: Understanding the differences between Temporal tables, CDC and Time Series.

2021-11-09 Thread Peter Eisentraut

On 10.11.21 07:43, SQL Padawan wrote:
I'm looking into Temporal Tables (TT - i.e. System/Application time 
versioning) but I would like to understand the differences between TTs 
and two other features (which appear similar in many ways) - Change Data 
Capture and Time Series.


Are Temporal Tables (in a sense) a query framework framework around CDC?

And if that's the case, what then are Time Series - which are 
specifically designed to host (time) changing data?


Are Temporal Tables (again, in a sense) Time Series databases which only 
record changes in time(-stamps)?


I'm unclear as to the "philosophical" distinctions here and would be 
grateful if anybody could explain the diffence(s) between them?


A time series database contains data records containing a time stamp (a 
point in time).  These records represent things that happened, for 
example log files, measurements, events.  This data is then usually used 
for analytics, such as, how many web site visitors did we have per hour 
over the last week.


A temporal database contains records that have a timestamp range (in 
PostgreSQL terms, also known as period or interval, delimited by two 
points in time).  There are (at least) two ways to use this.  One is 
that the database system automatically records when a record was 
inserted, updated, and deleted, so that historical versions of records 
are automatically kept, and you can then query, what was the value of 
this at this time, or more generally what was the result of this query 
at this time.  Use cases here are record-keeping, auditing, and the 
like.  The other is that the database users themselves enter when a 
record is supposed to be valid, and the database system takes this into 
account in queries.  So you could have product prices or salaries that 
automatically change at a certain time.  This is basically a tool to 
make development of certain database applications more convenient.


So these two concepts are really totally separate and independent 
things, except that they have a word based on "time" in their name.