Re: Q: fixing collation version mismatches

2022-11-14 Thread Daniel Verite
Karsten Hilbert wrote:

> Which is why my question still stands: does the above
> three-strikes operation safely take care of any collation
> issues that may currently exist in a database ?

For the indexes, yes, but theorically, all constraints involving collatable
types need a recheck.

For foreign key constraints with non-deterministic collations, there
might be equality tests that pass with an older Unicode version and fail
with a newer Unicode version.

For check constraints as well, checks applied to strings with recent
Unicode characters can give different results after an upgrade.


Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite




Re: Q: fixing collation version mismatches

2022-11-14 Thread Karsten Hilbert
Am Mon, Nov 14, 2022 at 05:42:16PM +0100 schrieb Daniel Verite:

> > Which is why my question still stands: does the above
> > three-strikes operation safely take care of any collation
> > issues that may currently exist in a database ?
>
> For the indexes, yes, but theorically, all constraints involving collatable
> types need a recheck.
>
> For foreign key constraints with non-deterministic collations, there
> might be equality tests that pass with an older Unicode version and fail
> with a newer Unicode version.

Which gives weight to the argument that using real-world data
(instead of surrogate keys) may lead to trouble.

> For check constraints as well, checks applied to strings with recent
> Unicode characters can give different results after an upgrade.

Thanks for pointing this out more clearly. My thinking
already evolved towards also including VALIDATE CONSTRAINT.

I shall, for the record, update the sequence in question:

-- indices
REINDEX DATABASE db_in_question;
-- constraints (check, foreign key)
UPDATE pg_constraint SET convalidated = false WHERE 
all_check_and_FK_constraints;
ALTER TABLE table_with_constraint VALIDATE CONSTRAINT 
constraint_on_that_table;
-- other things, see below
-- ...
-- refresh collation versions if no errors above
ALTER DATABASE db_in_question REFRESH COLLATION VERSION;
ALTER COLLATION every_collation_from_pg_collation REFRESH VERSION;

What else needs to be taken care of, and how ?

partitions

Need to re-sort rows into the proper partition as needed.

Can this be achievd by

UPDATE each_partitioned_table SET each_partitioned_key = 
each_partitioned_key;

?

Courtesy of 
(ddl-partitioning.html#DDL-PARTITIONING-CONSTRAINT-EXCLUSION):

Updating the partition key of a row will cause it to
be moved into a different partition if it no longer
satisfies the partition bounds of its original
partition.

range types

Will this

UPDATE table_with_range_type_column SET ranged_column = 
ranged_column

find all relevant issues ?

domains

Will this

UPDATE table_with_domain_type_column SET domained_column = 
domained_column

find all relevant issues ?

custom types

??

function immutability ??

It can be argued that functions marked IMMUTABLE really
are not in case they involve sorting of a collatable data
type, and are thus wrongly marked as IMMUTABLE.

IOW pre-existing user error.

If all this has been discussed in detail, I'd be glad for a
pointer into the archive.

Thanks,
Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B




Phantom Permissions Error

2022-11-14 Thread Upesh Desai
What type of bug is this?
Unexpected error
What subsystems and features are affected?
Other
What happened?
We are running into a weird permissions error that possibly seems to be related 
to Postgres, but it’s not clear what exactly is causing the issue. We run a 
script which grants the permissions below to a user, and for some reason, there 
are a many extraneous tables or relations that we have not created. When GRANT 
commands are run, does timescale do any extra permission changes? We only have 
10 tables created in the public schema.

PostgreSQL version used
14

What operating system did you use?
docker image: timescale-ha-base:pg14.5-ts2.7.2-p0-r0

What installation method did you use?
Docker, Kubernetes

What platform did you run on?
On prem/Self-hosted

Relevant log output and stack trace
postgres@timescale-0:~$ psql -q -v ON_ERROR_STOP=1 -d mydatabase <<-EOF
  set client_min_messages=DEBUG;
  REVOKE ALL ON DATABASE mydatabase FROM PUBLIC;
  GRANT CONNECT ON DATABASE mydatabase TO sinkd;
  GRANT USAGE ON SCHEMA public TO sinkd;
  GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO 
sinkd;  <  THIS LINE CAUSES THE ERROR
> EOF
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public.@" does not exist
DEBUG:  relation "public.@" does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public.8_19?]?" does not exist
DEBUG:  relation "public.8_19?]?" does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public.@" does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public.8_19?]?" does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  rehashing catalog cache id 50 for pg_class; 257 tups, 128 buckets
DEBUG:  rehashing catalog cache id 51 for pg_class; 257 tups, 128 buckets
DEBUG:  rehashing catalog cache id 50 for pg_class; 513 tups, 256 buckets
DEBUG:  rehashing catalog cache id 51 for pg_class; 513 tups, 256 buckets
DEBUG:  rehashing catalog cache id 50 for pg_class; 1025 tups, 512 buckets
DEBUG:  rehashing catalog cache id 51 for pg_class; 1025 tups, 512 buckets
DEBUG:  rehashing catalog cache id 50 for pg_class; 2049 tups, 1024 buckets
DEBUG:  rehashing catalog cache id 51 for pg_class; 2049 tups, 1024 buckets
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  rehashing catalog cache id 50 for pg_class; 4097 tups, 2048 buckets
DEBUG:  rehashing catalog cache id 51 for pg_class; 4097 tups, 2048 buckets
ERROR:  relation "public." does not exist


The resulting error is also not consistent, this last two times I ran the job, 
the error that was shown was:

ERROR: relation "public.+2022-09-17T03:00:00Z" does not exist
ERROR:  relation "public.k__compressed_hypertable_20_tid__ts" does not exist


One other point to consider is the command succeeds if run interactively from 
psql, but not if the command comes from stdin exactly like written above.

Adding the verbose setting shows this:

postgres@timescale-0:~$ psql -q -v ON_ERROR_STOP=1 -d obcerv <<-EOF
-- sinkd
\set VERBOSITY verbose
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO sinkd;
> EOF
ERROR: 42P01: relation "public." does not exist
LOCATION: RangeVarGetRel

PANIC: could not flush dirty data: Cannot allocate memory

2022-11-14 Thread klaus . mailinglists

Hi all!

We have a setup with a master and plenty of logical replication slaves. 
Master and slaves are 12.12-1.pgdg22.04+1 runnning on Ubuntu 22.04.
SELECT pg_size_pretty( pg_database_size('regdns') ); is from 25GB (fresh 
installed slave) to 42GB (probably bloat)


Replication slaves VMs have between 22G and 48G RAM, most have 48G RAM.

We are using:
maintenance_work_mem = 128MB
work_mem = 64MB

and VMs with 48G RAM:
effective_cache_size = 8192MB
shared_buffers = 6144MB

and VMs with 22G RAM:
effective_cache_size = 4096MB
shared_buffers = 2048MB

On several servers we see the error message: PANIC:  could not flush 
dirty data: Cannot allocate memory


Unfortunately I do not find any reference to this kind of error. Can you 
please describe what happens here in detail? Is it related to server 
memory? Or our memory settings? I am not so surprised that it happens 
with the 22G RAM VM. It is not happening on our 32G RAM VMs. But it also 
happens on some of the 48G RAM VMs which should have plenty of RAM 
available:

# free -h
   totalusedfree  shared  buff/cache   
available
Mem:47Gi 9Gi   1.2Gi   6.1Gi35Gi 
   30Gi

Swap:  7.8Gi   3.0Gi   4.9Gi

Of course I could upgrade all our VMs and then wait and see if it solved 
the problem. But I would like to understand what is happening here 
before spending $$$.


Thanks
Klaus





Re: PANIC: could not flush dirty data: Cannot allocate memory

2022-11-14 Thread Tom Lane
klaus.mailingli...@pernau.at writes:
> On several servers we see the error message: PANIC:  could not flush 
> dirty data: Cannot allocate memory

What that's telling you is that fsync (or some equivalent OS call)
returned ENOMEM, which would seem to be a kernel-level deficiency.
Perhaps you could dodge it by using a different wal_sync_method setting,
but complaining to your kernel vendor seems like the main thing
to be doing.

The reason we treat it as a PANIC condition is

 * Failure to fsync any data file is cause for immediate panic, unless
 * data_sync_retry is enabled.  Data may have been written to the operating
 * system and removed from our buffer pool already, and if we are running on
 * an operating system that forgets dirty data on write-back failure, there
 * may be only one copy of the data remaining: in the WAL.  A later attempt to
 * fsync again might falsely report success.  Therefore we must not allow any
 * further checkpoints to be attempted.  data_sync_retry can in theory be
 * enabled on systems known not to drop dirty buffered data on write-back
 * failure (with the likely outcome that checkpoints will continue to fail
 * until the underlying problem is fixed).

As noted here, turning on data_sync_retry would reduce the PANIC to
a WARNING.  But I wouldn't recommend that without some assurances
from your kernel vendor about what happens in the kernel after such a
failure.  The panic restart should (in theory) ensure data consistency
is preserved; without it we can't offer any guarantees.

regards, tom lane




Re: PANIC: could not flush dirty data: Cannot allocate memory

2022-11-14 Thread Christoph Moench-Tegeder
## klaus.mailingli...@pernau.at (klaus.mailingli...@pernau.at):

> On several servers we see the error message: PANIC:  could not flush 
> dirty data: Cannot allocate memory

As far as I can see, that "could not flush dirty data" happens total
three times in the code - there are other places where postgresql could
PANIC on fsync()-and-stuff-related issues, but they have different
messages.
Of these three places, there's an sync_file_range(), an posix_fadvise()
and an msync(), all in src/backend/storage/file/fd.c. "Cannot allocate
memory" would be ENOMEM, which posix_fadvise() does not return (as per
it's docs). So this would be sync_file_range(), which could run out
of memory (as per the manual) or msync() where ENOMEM actually means
"The indicated memory (or part of it) was not mapped". Both cases are
somewhat WTF for this setup.
What filesystem are you running?

Regards,
Christoph

-- 
Spare Space




Re: PANIC: could not flush dirty data: Cannot allocate memory

2022-11-14 Thread Thomas Munro
On Tue, Nov 15, 2022 at 10:54 AM Christoph Moench-Tegeder
 wrote:
> ## klaus.mailingli...@pernau.at (klaus.mailingli...@pernau.at):
> > On several servers we see the error message: PANIC:  could not flush
> > dirty data: Cannot allocate memory

> Of these three places, there's an sync_file_range(), an posix_fadvise()
> and an msync(), all in src/backend/storage/file/fd.c. "Cannot allocate
> memory" would be ENOMEM, which posix_fadvise() does not return (as per
> it's docs). So this would be sync_file_range(), which could run out
> of memory (as per the manual) or msync() where ENOMEM actually means
> "The indicated memory (or part of it) was not mapped". Both cases are
> somewhat WTF for this setup.

It must be sync_file_range().  The others are fallbacks that wouldn't
apply on a modern Linux.

It has been argued before that we might have been over-zealous
applying the PANIC promotion logic to sync_file_range().  It's used to
start asynchronous writeback to make the later fsync() call fast, so
it's "only a hint", but I have no idea if it could report a writeback
error from the kernel that would then be consumed and not reported to
the later fsync(), so I defaulted to assuming that it could.




Re: How to check stream replication latest history status

2022-11-14 Thread Kyotaro Horiguchi
At Mon, 14 Nov 2022 00:09:36 +0800, 徐志宇徐  wrote in 
>   If the primary and standby have long replay_lag. I don't want to
> standby server promote.
> 
>   Is there any view or function which store the latest stream
> replication gap(replay_lag) between primary and standby ?
>   Or it there any workaround to know this stream replication
> gap(replay_lag) infomation ?  Thanks for your help.

replay_lag vanishes when the steram is gone. But commit timestamps
might work for you. You can see when the last replayed transaction on
the standby was commited on the primary.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: PANIC: could not flush dirty data: Cannot allocate memory

2022-11-14 Thread Tom Lane
Thomas Munro  writes:
> It has been argued before that we might have been over-zealous
> applying the PANIC promotion logic to sync_file_range().  It's used to
> start asynchronous writeback to make the later fsync() call fast, so
> it's "only a hint", but I have no idea if it could report a writeback
> error from the kernel that would then be consumed and not reported to
> the later fsync(), so I defaulted to assuming that it could.

Certainly, if it reports EIO, we should panic.  But maybe not for
ENOMEM?  One would assume that that means that the request didn't
get queued for lack of in-kernel memory space ... in which case
"nothing happened".

regards, tom lane