AW: PostgreSQL HA FO question

2019-12-11 Thread Zwettler Markus (OIZ)
production experience.



Von: Dor Ben Dov 
Gesendet: Montag, 9. Dezember 2019 13:24
An: Zwettler Markus (OIZ) ; 
pgsql-general@lists.postgresql.org
Betreff: RE: PostgreSQL HA FO question

Hi Zwettler,

Based on what comparison, production experience ?

Regards,
Dor

From: Zwettler Markus (OIZ) 
mailto:markus.zwett...@zuerich.ch>>
Sent: Monday, December 9, 2019 1:13 PM
To: Dor Ben Dov mailto:dor.ben-...@amdocs.com>>; 
pgsql-general@lists.postgresql.org
Subject: AW: PostgreSQL HA FO question

In my opinion, the best open source product is Patroni.



Von: Dor Ben Dov mailto:dor.ben-...@amdocs.com>>
Gesendet: Montag, 9. Dezember 2019 11:24
An: 
pgsql-general@lists.postgresql.org
Betreff: PostgreSQL HA FO question

Hi everyone,


What is the best / mostly common / production used open source solution for HA 
/ FO/ backup and restore.
Open source community / Commercial

Best Regards,
Dor Ben Dov
This email and the information contained herein is proprietary and confidential 
and subject to the Amdocs Email Terms of Service, which you may review at 
https://www.amdocs.com/about/email-terms-of-service
This email and the information contained herein is proprietary and confidential 
and subject to the Amdocs Email Terms of Service, which you may review at 
https://www.amdocs.com/about/email-terms-of-service


Re: PGUSER and initdb

2019-12-11 Thread Олег Самойлов


> This utility, like most other PostgreSQL utilities, also uses the environment 
> variables supported by libpq (see Section 33.14).
> "
> 
> https://www.postgresql.org/docs/12/libpq-envars.html
> "
> PGUSER behaves the same as the user connection parameter.
> "

This is don't work with `initdb` and `pg_ctl initdb`.



Re: tcp keep alive don't work when the backend is busy

2019-12-11 Thread Олег Самойлов


> 10 дек. 2019 г., в 18:36, Justin  написал(а):
> 
> Hi Oner
> 
> It appears that you looking for a way to detect and kill of idle connections 
> or process that are running for a long time  Correct??

Nope, not idle. Only to stop an active connection if the client is already died.





Re: Identity columns, DEFAULT keyword and multi-row inserts

2019-12-11 Thread Peter Eisentraut

On 2019-12-10 17:53, Tom Lane wrote:

However, a multi-row insert like the following:
 insert into test (id, data)
 values
   (default,1),
   (default,2);
fails with:
 ERROR: cannot insert into column "id"
   Detail: Column "id" is an identity column defined as GENERATED ALWAYS.
   Hint: Use OVERRIDING SYSTEM VALUE to override.

Yeah, in principle a multi-row INSERT could grovel through all the
rows of the VALUES clause and confirm that every one of them has
DEFAULT there.  Not sure it's worth the trouble, but if it's not
a lot of code then maybe.  It certainly seems a bit inconsistent.


It looks like the multi-row case in transformInsertStmt() would have to 
develop a bit more smarts to discover this case and then replace the RTE 
reference in the target list with a single SetToDefault node?


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




AW: AW: secure deletion of archived logs

2019-12-11 Thread Zwettler Markus (OIZ)
> 
> Greetings,
> 
> * Zwettler Markus (OIZ) (markus.zwett...@zuerich.ch) wrote:
> > We use "rsync" on XFS with "wsync" mount mode. I think this should do the 
> > job?
> 
> No, that just makes sure that namespace operations are executed synchronously,
> that doesn't provide any guarantee that the data has actually been written 
> out and
> sync'd.
> 

What else? "rsync" on XFS mounted wsync + execute "sync" afterwards?


> > The tools mentioned will all do backup to disk.
> > We are required to do backup to tape.
> 
> Back up to disk first and then tar to tape.
> 
> Thanks,
> 
> Stephen


Markus




Re: AW: AW: secure deletion of archived logs

2019-12-11 Thread Stephen Frost
Greetings,

* Zwettler Markus (OIZ) (markus.zwett...@zuerich.ch) wrote:
> > * Zwettler Markus (OIZ) (markus.zwett...@zuerich.ch) wrote:
> > > We use "rsync" on XFS with "wsync" mount mode. I think this should do the 
> > > job?
> > 
> > No, that just makes sure that namespace operations are executed 
> > synchronously,
> > that doesn't provide any guarantee that the data has actually been written 
> > out and
> > sync'd.
> 
> What else? "rsync" on XFS mounted wsync + execute "sync" afterwards?

I don't really agree with it, but pitrery uses 'dd' with 'conv=fsync'.
As far as I know, there isn't a way to force 'rsync' to run an fsync()
at the end, and executing a 'sync' afterwards, while it should work,
seems likely to cause you a lot more troubles..

In the end, I wouldn't suggest trying to hack up your own scripts to do
any of this- use one of the existing tools.  It's a lot more complicated
than I think you're appreciating.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Logical Replication of Multiple Schema Versions

2019-12-11 Thread Peter Eisentraut

On 2019-12-10 08:55, Dan shmidt wrote:

What is the correct way to perform such an operation?
Is there a way to keep constraint #1 or the only option is to not allow 
"breaking" schema changes between versions.


It all depends on the specific schema changes you want to make.  You can 
add columns on the subscriber and remove columns on the publisher 
without breaking things (unless there are not-null constraints). 
Renaming columns will break replication until you rename them 
everywhere.  Column type changes will usually just work as long as the 
data fits into both the old and the new type.


You really need to carefully plan and test each class of scenarios 
separately.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: PGUSER and initdb

2019-12-11 Thread Adrian Klaver

On 12/11/19 5:18 AM, Олег Самойлов wrote:



This utility, like most other PostgreSQL utilities, also uses the environment 
variables supported by libpq (see Section 33.14).
"

https://www.postgresql.org/docs/12/libpq-envars.html
"
PGUSER behaves the same as the user connection parameter.
"


This is don't work with `initdb` and `pg_ctl initdb`.



Hmm, yeah. So -U works but not the env variable. I would file a bug 
report(https://www.postgresql.org/account/login/?next=/account/submitbug/):


aklaver@tito:~> env | grep PGUSER
PGUSER=postgres

/usr/local/pgsql12/bin/initdb   -D pg_test_postgres/
/usr/local/pgsql12/bin/pg_ctl -D pg_test_postgres start

aklaver@tito:~> psql -d postgres -U postgres -p 5429
2019-12-11 08:25:22.334 PST [4978] FATAL:  role "postgres" does not exist
psql: error: could not connect to server: FATAL:  role "postgres" does 
not exist

aklaver@tito:~> psql -d postgres -U aklaver -p 5429
Null display is "NULL".
psql (12.1)
Type "help" for help.

postgres=# \du
   List of roles
 Role name | Attributes 
| Member of

---++---
 aklaver   | Superuser, Create role, Create DB, Replication, Bypass RLS 
| {}



/usr/local/pgsql12/bin/initdb -U postgres  -D pg_test_postgres/

aklaver@tito:~> psql -d postgres -U postgres -p 5429
Null display is "NULL".
psql (12.1)
Type "help" for help.

postgres=# \du
   List of roles
 Role name | Attributes 
| Member of

---++---
 postgres  | Superuser, Create role, Create DB, Replication, Bypass RLS 
| {}




--
Adrian Klaver
adrian.kla...@aklaver.com




Re: PGUSER and initdb

2019-12-11 Thread Tom Lane
Adrian Klaver  writes:
> On 12/11/19 5:18 AM, Олег Самойлов wrote:
> This utility, like most other PostgreSQL utilities, also uses the environment 
> variables supported by libpq (see Section 33.14).
> https://www.postgresql.org/docs/12/libpq-envars.html
> PGUSER behaves the same as the user connection parameter.
>> This is don't work with `initdb` and `pg_ctl initdb`.

This seems like nonsense to me.  The initdb documentation says perfectly
clearly that the initial superuser's name is set by -U.  It does not
say anything about PGUSER.  I think the reference to libpq in the docs
probably should be taken out, because initdb never uses libpq to make
a connection, so libpq's defaults are not relevant.

regards, tom lane




Re: PGUSER and initdb

2019-12-11 Thread Adrian Klaver

On 12/11/19 9:06 AM, Tom Lane wrote:

Adrian Klaver  writes:

On 12/11/19 5:18 AM, Олег Самойлов wrote:
This utility, like most other PostgreSQL utilities, also uses the environment 
variables supported by libpq (see Section 33.14).
https://www.postgresql.org/docs/12/libpq-envars.html
PGUSER behaves the same as the user connection parameter.

This is don't work with `initdb` and `pg_ctl initdb`.


This seems like nonsense to me.  The initdb documentation says perfectly
clearly that the initial superuser's name is set by -U.  It does not
say anything about PGUSER.  I think the reference to libpq in the docs
probably should be taken out, because initdb never uses libpq to make
a connection, so libpq's defaults are not relevant.


I am not seeing it as nonsense. The OP made a valid observation, PGUSER 
is not used by the program even though the docs say it should be. If 
what you say about is libpq is also valid for the other programs listed 
here:


https://www.postgresql.org/docs/12/reference-server.html

then the docs probably to be gone over for them also:

https://www.postgresql.org/docs/12/app-pg-ctl.html

"pg_ctl, like most other PostgreSQL utilities, also uses the environment 
variables supported by libpq (see Section 33.14)."




regards, tom lane




--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Fast, stable, portable hash function producing 4-byte or 8-byte values?

2019-12-11 Thread Erwin Brandstetter
Thanks for the suggestion. Seems like a good assumption and I have been
using hashtext() in the past. But I am uncertain whether it is the best
option.

Guess Tom's warning in
https://www.postgresql.org/message-id/9434.1568839...@sss.pgh.pa.us about
portability only refers to hashtextextended() and friends not being there
in Postgres 10 or older.

But why are none of these functions documented? Does the project still not
...

> want people to rely on them continuing to have exactly the current
behavior.

I am not complaining, maybe just nobody did the work. But it's also
mentioned in this old thread, that hastext() changed in the past. Is all of
that outdated and we are welcome to use those functions for indexing?
https://www.postgresql.org/message-id/flat/24463.1329854466%40sss.pgh.pa.us#c18e62281dc78f6d64c1a4d41ab8569b


Filtering with amprocnum = 2 gets functions producing bigint in Postgres 11
or later.  Not sure about the exact meaning of amprocnum, manual says
"Support function number".

Remaining problem either way: no hash function returning bigint for
Postgres 10.

Regards
Erwin

On Tue, Dec 10, 2019 at 11:13 PM Laurenz Albe 
wrote:

> On Tue, 2019-12-10 at 22:11 +0100, Erwin Brandstetter wrote:
> > I am looking for stable hash functions producing 8-byte or 4-byte hashes
> from long text values in Postgres 10 or later.
> >
> > [...]
> >
> > There is an old post from 2012 by Tom Lane suggesting that hashtext()
> and friends are not for users:
> >
> > https://www.postgresql.org/message-id/24463.1329854466%40sss.pgh.pa.us
>
> Changing a hash function would corrupt hash indexes, wouldn't it?
>
> So I'd expect these functions to be pretty stable:
>
> SELECT amp.amproc
> FROM pg_amproc AS amp
>JOIN pg_opfamily AS opf ON amp.amprocfamily = opf.oid
>JOIN pg_am ON opf.opfmethod = pg_am.oid
> WHERE pg_am.amname = 'hash'
>   AND amp.amprocnum = 1;
>
> Or at least there would have to be a fat warning in the release notes
> to reindex hash indexes.
>
> Yours,
> Laurenz Albe
> --
> Cybertec | https://www.cybertec-postgresql.com
>
>


Re: Fast, stable, portable hash function producing 4-byte or 8-byte values?

2019-12-11 Thread Erwin Brandstetter
On Tue, Dec 10, 2019 at 11:34 PM Miles Elam 
wrote:

> In terms of "wasted computation", MD5, SHA1, and the others always compute
> the full length before they are passed to a UUID, int, or whatever. It's a
> sunk cost. It's also a minor cost considering many hash algorithms are
> performed in CPU hardware now. All that's left is the truncation and cast,
> which you can't avoid easily.
>
>
> Sure, you could reimplement Java's .hashCode() method by iterating through
> the characters and processing the character codes:
>
> s[0]*31^(n - 1) + s[1]*31^(n - 2) + ... + s[n - 1]
>
>
> I don't see how that would beat the CPU-based hashes though unless you
> wrote a C-based extension. Maybe it's better just to embrace the
> user-defined function first and then decide if performance is insufficient
> for your use cases.
>
>
> CREATE EXTENSION IF NOT EXISTS pgcrypto;
>
> CREATE OR REPLACE FUNCTION hash8 (p_data text, p_algo text = 'md5')
> RETURNS int8 AS $$
>
> SELECT ('x' || encode(substring(digest(p_data, p_algo) FROM 1 FOR 16),
> 'hex'))::bit(64)::int8
>
> $$ LANGUAGE sql IMMUTABLE PARALLEL SAFE;
>
>
> CREATE OR REPLACE FUNCTION hash4 (p_data text, p_algo text = 'md5')
> RETURNS int4 AS $$
>
> SELECT ('x' || encode(substring(digest(p_data, p_algo) FROM 1 FOR 8),
> 'hex'))::bit(32)::int4
>
> $$ LANGUAGE sql IMMUTABLE PARALLEL SAFE;
>
>
> SELECT
>
> hash4('something something something'),
>
> hash4('something something something', 'sha1'),
>
> hash8('something something something'),
>
> hash8('something something something', 'sha1');
>
>
>
> Cheers,
>
>
> Miles
>



Thanks for the custom functions! May be useful as fallback.

But I am really looking for standard functions in Postgres first. Those
should be faster and more reliable than writing my own.


Regards

Erwin


Encoding/collation question

2019-12-11 Thread Rich Shepard

My older databases have LATIN1 encoding and C collation; the newer ones have
UTF8 encoding and en_US.UTF-8 collation. A web search taught me that I can
change each old database by dumping it and restoring it with the desired
encoding and collation types. My question is whether the older types make
any difference in a single-user environment.

Regards,

Rich




Re: Fast, stable, portable hash function producing 4-byte or 8-byte values?

2019-12-11 Thread Tom Lane
Erwin Brandstetter  writes:
> Guess Tom's warning in
> https://www.postgresql.org/message-id/9434.1568839...@sss.pgh.pa.us about
> portability only refers to hashtextextended() and friends not being there
> in Postgres 10 or older.

Well, the other portability issue that is worth considering is that
these functions are only intended to give stable results within a
particular installation; their use for hash indexes does not require
the same results across different platforms.  Notably, most of them
give different answers on little-endian and big-endian machines.

That's not necessarily a reason not to use them, but you have to
be careful what you assume about them.

regards, tom lane




Re: Encoding/collation question

2019-12-11 Thread Tom Lane
Rich Shepard  writes:
> My older databases have LATIN1 encoding and C collation; the newer ones have
> UTF8 encoding and en_US.UTF-8 collation. A web search taught me that I can
> change each old database by dumping it and restoring it with the desired
> encoding and collation types. My question is whether the older types make
> any difference in a single-user environment.

String comparisons in non-C collations tend to be a lot slower than
they are in C collation.  Whether this makes a noticeable difference
to you depends on your workload, but certainly we've seen performance
gripes that trace to that.

If your data doesn't require the larger character set of UTF8, then
using LATIN-any is going to offer some space savings (for non-ASCII
characters) plus minor performance benefits due to the lack of
variable-width characters.  This is less significant than the
collation issue, though, for most people.

regards, tom lane




Re: Encoding/collation question

2019-12-11 Thread Rich Shepard

On Wed, 11 Dec 2019, Tom Lane wrote:


String comparisons in non-C collations tend to be a lot slower than they
are in C collation. Whether this makes a noticeable difference to you
depends on your workload, but certainly we've seen performance gripes that
trace to that.


Tom,

How interesting.


If your data doesn't require the larger character set of UTF8, then using
LATIN-any is going to offer some space savings (for non-ASCII characters)
plus minor performance benefits due to the lack of variable-width
characters. This is less significant than the collation issue, though, for
most people.


I doubt that my use will notice meaningful differences. Since there are only
two or three databases in UTF8 and its collation perhaps I'll convert those
to LATIN1 and C.

Thanks for the insights.

Regards,

Rich




Counting number of sites with same number of sampling dates

2019-12-11 Thread Rich Shepard

A sampling location table has 28 distinct sites, each site being sampled
from 1 to 67 times. I'm trying to obtain the number of sites having 1
sample, 2 samples, ... 67 samples and am not seeing the solution despite
several alternative queries.

The query,

select site_nbr, count(distinct sampdate) from wrb_hg_cong group by site_nbr 
order by site_nbr;

returns the number of times each site has been sampled, for example:

site_nbr | count 
--+---

 10332|11
 10335| 1
 10339| 2
 10340| 1
 10342| 4
 10344|18
 10347| 2
 10348| 2
 10350| 2
 10351| 1
 10355|14
etc.

I want the number of sites for each number of samples (e.g., how many sites
with one sampdate, 2 sampdates, etc.). I cannot group by an aggregate such
as count(distinct sampdate) yet I'm confident the syntax is simple and I'm
not seeing how to get the number in each group.

What is an appropriate select statement for this?

TIA,

Rich





Re: Counting number of sites with same number of sampling dates

2019-12-11 Thread Michael Lewis
On Wed, Dec 11, 2019 at 1:54 PM Rich Shepard 
wrote:

> A sampling location table has 28 distinct sites, each site being sampled
> from 1 to 67 times. I'm trying to obtain the number of sites having 1
> sample, 2 samples, ... 67 samples and am not seeing the solution despite
> several alternative queries.
>
> The query,
>
> select site_nbr, count(distinct sampdate) from wrb_hg_cong group by
> site_nbr order by site_nbr;
>
> returns the number of times each site has been sampled, for example:
>
> site_nbr | count
> --+---
>   10332|11
>   10335| 1
>   10339| 2
>   10340| 1
>   10342| 4
>   10344|18
>   10347| 2
>   10348| 2
>   10350| 2
>   10351| 1
>   10355|14
> etc.
>
> I want the number of sites for each number of samples (e.g., how many sites
> with one sampdate, 2 sampdates, etc.). I cannot group by an aggregate such
> as count(distinct sampdate) yet I'm confident the syntax is simple and I'm
> not seeing how to get the number in each group.
>


Put what you have in a subquery and group/aggregate again.

select sample_count, count( site_nbr ) FROM (
select site_nbr, count(distinct sampdate) AS sample_count from wrb_hg_cong
group by site_nbr order by site_nbr
) sub
group by sample_count;


Re: Counting number of sites with same number of sampling dates

2019-12-11 Thread Rich Shepard

On Wed, 11 Dec 2019, Michael Lewis wrote:


Put what you have in a subquery and group/aggregate again.

select sample_count, count( site_nbr ) FROM (
select site_nbr, count(distinct sampdate) AS sample_count from wrb_hg_cong
group by site_nbr order by site_nbr
) sub
group by sample_count;


Michael,

Well, darn! I totally speced using a subquery. Thank you very much for an
important lesson.

Best regards,

Rich




Re: Counting number of sites with same number of sampling dates

2019-12-11 Thread Rich Shepard

On Wed, 11 Dec 2019, Ron wrote:


The SUM() function?


Ron,

Interesting. I'll look at this, too.

Regards,

Rich




Re: Logical Replication of Multiple Schema Versions

2019-12-11 Thread Dan shmidt
Thanks Peter for your answer.

I was hoping to keep the freedom of performing any change to my schema.
Since the entire upgrade might take more than a week, there will be a time in 
which I have two publisher nodes with different schema versions (which might 
include a column rename).
I was hoping that were was a way to somehow apply some logic on the subscriber 
to convert one schema to another. I also believe that limiting schema changes 
only to those that won't break replication should suffice.

Dan.


From: Peter Eisentraut 
Sent: Wednesday, December 11, 2019 6:13 PM
To: Dan shmidt ; pgsql-general@lists.postgresql.org 

Subject: Re: Logical Replication of Multiple Schema Versions

On 2019-12-10 08:55, Dan shmidt wrote:
> What is the correct way to perform such an operation?
> Is there a way to keep constraint #1 or the only option is to not allow
> "breaking" schema changes between versions.

It all depends on the specific schema changes you want to make.  You can
add columns on the subscriber and remove columns on the publisher
without breaking things (unless there are not-null constraints).
Renaming columns will break replication until you rename them
everywhere.  Column type changes will usually just work as long as the
data fits into both the old and the new type.

You really need to carefully plan and test each class of scenarios
separately.

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Logical replication DNS cache

2019-12-11 Thread Mike Lissner
I've got a server at example.com that currently publishes logical
replication to a server in AWS RDS. I plan to move the server at example.com
so that it has a new IP address (but same domain name).

I'm curious if anybody knows how the logical replication subscriber in AWS
would handle that.

There's at least three layers where the DNS might be cached, creating
breakage once the move is complete:

 - Postgres itself

 - AWS's postgresql fork in RDS might have something

 - The OS underlying amazon's RDS service

I expect this is a tough question unless somebody has done this before, but
any ideas on how postgresql would handle this kind of thing? Or is there a
way to flush the DNS cache that postgresql (or RDS or the OS) has?

I'm just beginning to explore this, but if anybody has experience, I'd love
to hear it.

Thanks,


Mike


Re: Encoding/collation question

2019-12-11 Thread Andrew Gierth
> "Rich" == Rich Shepard  writes:

 Rich> I doubt that my use will notice meaningful differences. Since
 Rich> there are only two or three databases in UTF8 and its collation
 Rich> perhaps I'll convert those to LATIN1 and C.

Note that it's perfectly fine to use UTF8 encoding and C collation (this
has the effect of sorting strings in Unicode codepoint order); this is
as fast for comparisons as LATIN1/C is.

For those cases where you need data to be sorted in a
culturally-meaningful order rather than in codepoint order, you can set
collations on specific columns or in individual queries.

-- 
Andrew (irc:RhodiumToad)