Re: Q: fixing collation version mismatches
Karsten Hilbert wrote: > Which is why my question still stands: does the above > three-strikes operation safely take care of any collation > issues that may currently exist in a database ? For the indexes, yes, but theorically, all constraints involving collatable types need a recheck. For foreign key constraints with non-deterministic collations, there might be equality tests that pass with an older Unicode version and fail with a newer Unicode version. For check constraints as well, checks applied to strings with recent Unicode characters can give different results after an upgrade. Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite
Re: Q: fixing collation version mismatches
Am Mon, Nov 14, 2022 at 05:42:16PM +0100 schrieb Daniel Verite: > > Which is why my question still stands: does the above > > three-strikes operation safely take care of any collation > > issues that may currently exist in a database ? > > For the indexes, yes, but theorically, all constraints involving collatable > types need a recheck. > > For foreign key constraints with non-deterministic collations, there > might be equality tests that pass with an older Unicode version and fail > with a newer Unicode version. Which gives weight to the argument that using real-world data (instead of surrogate keys) may lead to trouble. > For check constraints as well, checks applied to strings with recent > Unicode characters can give different results after an upgrade. Thanks for pointing this out more clearly. My thinking already evolved towards also including VALIDATE CONSTRAINT. I shall, for the record, update the sequence in question: -- indices REINDEX DATABASE db_in_question; -- constraints (check, foreign key) UPDATE pg_constraint SET convalidated = false WHERE all_check_and_FK_constraints; ALTER TABLE table_with_constraint VALIDATE CONSTRAINT constraint_on_that_table; -- other things, see below -- ... -- refresh collation versions if no errors above ALTER DATABASE db_in_question REFRESH COLLATION VERSION; ALTER COLLATION every_collation_from_pg_collation REFRESH VERSION; What else needs to be taken care of, and how ? partitions Need to re-sort rows into the proper partition as needed. Can this be achievd by UPDATE each_partitioned_table SET each_partitioned_key = each_partitioned_key; ? Courtesy of (ddl-partitioning.html#DDL-PARTITIONING-CONSTRAINT-EXCLUSION): Updating the partition key of a row will cause it to be moved into a different partition if it no longer satisfies the partition bounds of its original partition. range types Will this UPDATE table_with_range_type_column SET ranged_column = ranged_column find all relevant issues ? domains Will this UPDATE table_with_domain_type_column SET domained_column = domained_column find all relevant issues ? custom types ?? function immutability ?? It can be argued that functions marked IMMUTABLE really are not in case they involve sorting of a collatable data type, and are thus wrongly marked as IMMUTABLE. IOW pre-existing user error. If all this has been discussed in detail, I'd be glad for a pointer into the archive. Thanks, Karsten -- GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B
Phantom Permissions Error
What type of bug is this? Unexpected error What subsystems and features are affected? Other What happened? We are running into a weird permissions error that possibly seems to be related to Postgres, but it’s not clear what exactly is causing the issue. We run a script which grants the permissions below to a user, and for some reason, there are a many extraneous tables or relations that we have not created. When GRANT commands are run, does timescale do any extra permission changes? We only have 10 tables created in the public schema. PostgreSQL version used 14 What operating system did you use? docker image: timescale-ha-base:pg14.5-ts2.7.2-p0-r0 What installation method did you use? Docker, Kubernetes What platform did you run on? On prem/Self-hosted Relevant log output and stack trace postgres@timescale-0:~$ psql -q -v ON_ERROR_STOP=1 -d mydatabase <<-EOF set client_min_messages=DEBUG; REVOKE ALL ON DATABASE mydatabase FROM PUBLIC; GRANT CONNECT ON DATABASE mydatabase TO sinkd; GRANT USAGE ON SCHEMA public TO sinkd; GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO sinkd; < THIS LINE CAUSES THE ERROR > EOF DEBUG: relation "public." does not exist DEBUG: relation "public." does not exist DEBUG: relation "public." does not exist DEBUG: relation "public." does not exist DEBUG: relation "public." does not exist DEBUG: relation "public." does not exist DEBUG: relation "public." does not exist DEBUG: relation "public." does not exist DEBUG: relation "public." does not exist DEBUG: relation "public." does not exist DEBUG: relation "public.@" does not exist DEBUG: relation "public.@" does not exist DEBUG: relation "public." does not exist DEBUG: relation "public." does not exist DEBUG: relation "public." does not exist DEBUG: relation "public." does not exist DEBUG: relation "public.8_19?]?" does not exist DEBUG: relation "public.8_19?]?" does not exist DEBUG: relation "public." does not exist DEBUG: relation "public." does not exist DEBUG: relation "public." does not exist DEBUG: relation "public." does not exist DEBUG: relation "public." does not exist DEBUG: relation "public." does not exist DEBUG: relation "public." does not exist DEBUG: relation "public." does not exist DEBUG: relation "public." does not exist DEBUG: relation "public." does not exist DEBUG: relation "public." does not exist DEBUG: relation "public." does not exist DEBUG: relation "public." does not exist DEBUG: relation "public." does not exist DEBUG: relation "public." does not exist DEBUG: relation "public." does not exist DEBUG: relation "public." does not exist DEBUG: relation "public." does not exist DEBUG: relation "public." does not exist DEBUG: relation "public." does not exist DEBUG: relation "public." does not exist DEBUG: relation "public." does not exist DEBUG: relation "public." does not exist DEBUG: relation "public.@" does not exist DEBUG: relation "public." does not exist DEBUG: relation "public." does not exist DEBUG: relation "public.8_19?]?" does not exist DEBUG: relation "public." does not exist DEBUG: relation "public." does not exist DEBUG: relation "public." does not exist DEBUG: relation "public." does not exist DEBUG: relation "public." does not exist DEBUG: relation "public." does not exist DEBUG: relation "public." does not exist DEBUG: rehashing catalog cache id 50 for pg_class; 257 tups, 128 buckets DEBUG: rehashing catalog cache id 51 for pg_class; 257 tups, 128 buckets DEBUG: rehashing catalog cache id 50 for pg_class; 513 tups, 256 buckets DEBUG: rehashing catalog cache id 51 for pg_class; 513 tups, 256 buckets DEBUG: rehashing catalog cache id 50 for pg_class; 1025 tups, 512 buckets DEBUG: rehashing catalog cache id 51 for pg_class; 1025 tups, 512 buckets DEBUG: rehashing catalog cache id 50 for pg_class; 2049 tups, 1024 buckets DEBUG: rehashing catalog cache id 51 for pg_class; 2049 tups, 1024 buckets DEBUG: relation "public." does not exist DEBUG: relation "public." does not exist DEBUG: rehashing catalog cache id 50 for pg_class; 4097 tups, 2048 buckets DEBUG: rehashing catalog cache id 51 for pg_class; 4097 tups, 2048 buckets ERROR: relation "public." does not exist The resulting error is also not consistent, this last two times I ran the job, the error that was shown was: ERROR: relation "public.+2022-09-17T03:00:00Z" does not exist ERROR: relation "public.k__compressed_hypertable_20_tid__ts" does not exist One other point to consider is the command succeeds if run interactively from psql, but not if the command comes from stdin exactly like written above. Adding the verbose setting shows this: postgres@timescale-0:~$ psql -q -v ON_ERROR_STOP=1 -d obcerv <<-EOF -- sinkd \set VERBOSITY verbose GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO sinkd; > EOF ERROR: 42P01: relation "public." does not exist LOCATION: RangeVarGetRel
PANIC: could not flush dirty data: Cannot allocate memory
Hi all! We have a setup with a master and plenty of logical replication slaves. Master and slaves are 12.12-1.pgdg22.04+1 runnning on Ubuntu 22.04. SELECT pg_size_pretty( pg_database_size('regdns') ); is from 25GB (fresh installed slave) to 42GB (probably bloat) Replication slaves VMs have between 22G and 48G RAM, most have 48G RAM. We are using: maintenance_work_mem = 128MB work_mem = 64MB and VMs with 48G RAM: effective_cache_size = 8192MB shared_buffers = 6144MB and VMs with 22G RAM: effective_cache_size = 4096MB shared_buffers = 2048MB On several servers we see the error message: PANIC: could not flush dirty data: Cannot allocate memory Unfortunately I do not find any reference to this kind of error. Can you please describe what happens here in detail? Is it related to server memory? Or our memory settings? I am not so surprised that it happens with the 22G RAM VM. It is not happening on our 32G RAM VMs. But it also happens on some of the 48G RAM VMs which should have plenty of RAM available: # free -h totalusedfree shared buff/cache available Mem:47Gi 9Gi 1.2Gi 6.1Gi35Gi 30Gi Swap: 7.8Gi 3.0Gi 4.9Gi Of course I could upgrade all our VMs and then wait and see if it solved the problem. But I would like to understand what is happening here before spending $$$. Thanks Klaus
Re: PANIC: could not flush dirty data: Cannot allocate memory
klaus.mailingli...@pernau.at writes: > On several servers we see the error message: PANIC: could not flush > dirty data: Cannot allocate memory What that's telling you is that fsync (or some equivalent OS call) returned ENOMEM, which would seem to be a kernel-level deficiency. Perhaps you could dodge it by using a different wal_sync_method setting, but complaining to your kernel vendor seems like the main thing to be doing. The reason we treat it as a PANIC condition is * Failure to fsync any data file is cause for immediate panic, unless * data_sync_retry is enabled. Data may have been written to the operating * system and removed from our buffer pool already, and if we are running on * an operating system that forgets dirty data on write-back failure, there * may be only one copy of the data remaining: in the WAL. A later attempt to * fsync again might falsely report success. Therefore we must not allow any * further checkpoints to be attempted. data_sync_retry can in theory be * enabled on systems known not to drop dirty buffered data on write-back * failure (with the likely outcome that checkpoints will continue to fail * until the underlying problem is fixed). As noted here, turning on data_sync_retry would reduce the PANIC to a WARNING. But I wouldn't recommend that without some assurances from your kernel vendor about what happens in the kernel after such a failure. The panic restart should (in theory) ensure data consistency is preserved; without it we can't offer any guarantees. regards, tom lane
Re: PANIC: could not flush dirty data: Cannot allocate memory
## klaus.mailingli...@pernau.at (klaus.mailingli...@pernau.at): > On several servers we see the error message: PANIC: could not flush > dirty data: Cannot allocate memory As far as I can see, that "could not flush dirty data" happens total three times in the code - there are other places where postgresql could PANIC on fsync()-and-stuff-related issues, but they have different messages. Of these three places, there's an sync_file_range(), an posix_fadvise() and an msync(), all in src/backend/storage/file/fd.c. "Cannot allocate memory" would be ENOMEM, which posix_fadvise() does not return (as per it's docs). So this would be sync_file_range(), which could run out of memory (as per the manual) or msync() where ENOMEM actually means "The indicated memory (or part of it) was not mapped". Both cases are somewhat WTF for this setup. What filesystem are you running? Regards, Christoph -- Spare Space
Re: PANIC: could not flush dirty data: Cannot allocate memory
On Tue, Nov 15, 2022 at 10:54 AM Christoph Moench-Tegeder wrote: > ## klaus.mailingli...@pernau.at (klaus.mailingli...@pernau.at): > > On several servers we see the error message: PANIC: could not flush > > dirty data: Cannot allocate memory > Of these three places, there's an sync_file_range(), an posix_fadvise() > and an msync(), all in src/backend/storage/file/fd.c. "Cannot allocate > memory" would be ENOMEM, which posix_fadvise() does not return (as per > it's docs). So this would be sync_file_range(), which could run out > of memory (as per the manual) or msync() where ENOMEM actually means > "The indicated memory (or part of it) was not mapped". Both cases are > somewhat WTF for this setup. It must be sync_file_range(). The others are fallbacks that wouldn't apply on a modern Linux. It has been argued before that we might have been over-zealous applying the PANIC promotion logic to sync_file_range(). It's used to start asynchronous writeback to make the later fsync() call fast, so it's "only a hint", but I have no idea if it could report a writeback error from the kernel that would then be consumed and not reported to the later fsync(), so I defaulted to assuming that it could.
Re: How to check stream replication latest history status
At Mon, 14 Nov 2022 00:09:36 +0800, 徐志宇徐 wrote in > If the primary and standby have long replay_lag. I don't want to > standby server promote. > > Is there any view or function which store the latest stream > replication gap(replay_lag) between primary and standby ? > Or it there any workaround to know this stream replication > gap(replay_lag) infomation ? Thanks for your help. replay_lag vanishes when the steram is gone. But commit timestamps might work for you. You can see when the last replayed transaction on the standby was commited on the primary. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: PANIC: could not flush dirty data: Cannot allocate memory
Thomas Munro writes: > It has been argued before that we might have been over-zealous > applying the PANIC promotion logic to sync_file_range(). It's used to > start asynchronous writeback to make the later fsync() call fast, so > it's "only a hint", but I have no idea if it could report a writeback > error from the kernel that would then be consumed and not reported to > the later fsync(), so I defaulted to assuming that it could. Certainly, if it reports EIO, we should panic. But maybe not for ENOMEM? One would assume that that means that the request didn't get queued for lack of in-kernel memory space ... in which case "nothing happened". regards, tom lane