Shared buffer hash table corrupted
Hi All, Running 9.6.15, this morning we got a 'shared buffer hash table corrupted' error on a query. I reran the query a couple hours later, and it completed without error. This is running in production on a Linode instance which hasn't seen any config changes in months. I didn't find much on-line about this. How concerned should I be? Would you move the instance to a different physical host? Thanks, Mark
Re: Shared buffer hash table corrupted
On Fri, Feb 21, 2020 at 2:53 PM Tom Lane wrote: > > Personally, I'd restart the postmaster, but not do more than that unless > the error recurs. > Thanks for the response. I did restart the postmaster yesterday. Earlier this morning, a query that normally completes fine started to error out with 'invalid memory alloc request size 18446744073709551613'. Needless to say our database isn't quite that size. This query was against a table in a different database than the one that had the corruption warning yesterday. Restarting the postmaster again fixed the problem. For good measure I restarted the machine as well. I need to decide what to do next, if anything. We have a hot standby that we also run queries against, and it hasn't shown any errors. I can switch over to that as the primary. Or I can move the main database to a different physical host. Thoughts appreciated. Thanks, Mark
Re: Shared buffer hash table corrupted
On Sat, Feb 22, 2020 at 9:34 AM Tom Lane wrote: > > Um. At that point I'd agree with your concern about developing hardware > problems. Both of these symptoms could be easily explained by dropped > bits in PG's shared memory area. Do you happen to know if the server > has ECC RAM? > > Yes, it appears that Linode uses ECC and other server grade hardware for their machines. Thanks, Mark
Logical replication/publication question
Hi, We're migrating from 9.6 to 14, using pglogical. We have several logical slots on the 9.6 instance implementing a change data capture pattern. For the migration, we plan on recreating the slots on the 14 instance, without taking a snapshot of the data. When the migration happens, we will simply start using the slots on the 14 instance (with the understanding that the LSNs won't match between the 9.6 and 14 instances). In testing, we have this working, but there was a wrinkle and I'd like to know if my understanding is correct. On the 9.6 instance, when creating the replication slots, we would use the START_REPLICATION SLOT command, and then immediately take a snapshot, and it has worked great for years. On a 14 instance, receiving changes from the 9.6 instance via pglogical, if we recreate the logical slot, but not take a snapshot, no changes are propagated, and there are no errors. However, if we first issue a 'CREATE PUBLICATION' command, then changes are propagated to the slot as expected. We never issued a 'CREATE PUBLICATION' command on the 9.6 instance. My guess is that the act of taking a snapshot twittled whatever bits were necessary to propagate changes through the slot. By not taking a snapshot on the 14 instance, that doesn't happen, hence the need for the 'CREATE PUBLICATION' command. Alternatively, something was changed between 9.6 and 14 that now requires the new command (I looked through the docs and could find no mention, however). Is my understanding correct? Am I missing something? I just want to make sure I'm not screwing something up. Thanks, Mark
Re: Oracle vs PG
A Twitter thread from someone who talked with the reporter (also read Werner's statement referenced in the first tweet): https://twitter.com/andy_pavlo/status/1055154039606910976 Mark
ERROR: found multixact XX from before relminmxid YY
Hi, Starting yesterday morning, auto vacuuming of one of our postgresql 9.6.10 (CentOS 7) table's started failing: ERROR: found multixact 370350365 from before relminmxid 765860874 CONTEXT: automatic vacuum of table "userdb.public.subs" This is about as plain and simple a table as there is. No triggers or foreign keys, I'm not using any extensions. It has about 2.8M rows. I have not done any consistency checks, but nothing strange has manifested in production. Reading the various discussions about this error, the only solution I found was here: https://www.postgresql.org/message-id/CAGewt-ukbL6WL8cc-G%2BiN9AVvmMQkhA9i2TKP4-6wJr6YOQkzA%40mail.gmail.com But no other reports of this solving the problem. Can someone verify that if I do the mentioned fix (and I assume upgrade to 9.6.11) that will fix the problem? And that it doesn't indicate table corruption? Thanks, Mark
Re: ERROR: found multixact XX from before relminmxid YY
On Fri, Dec 28, 2018 at 4:49 PM Tom Lane wrote: > > Yeah, SELECT FOR UPDATE should overwrite the broken xmax value and thereby > fix it, I expect. However, I don't see anything in the release notes > suggesting that we've fixed any related bugs since 9.6.10, so if this > just appeared then we've still got a problem :-(. Did anything > interesting happen since your last successful autovacuum on that table? > Database crashes, WAL-related parameter changes, that sort of thing? > > The last autovacuum of that table was on Dec 8th, the last auto analyze was Dec 26. There have been no schema changes on that particular table, database crashes or WAL-related parameter changes since then. We've done other schema changes during that time, but otherwise the database has been stable. Thanks, Mark
Re: Pulling data from Postgres DB table for every 5 seconds.
On Wed, Jan 9, 2019 at 9:02 AM github kran wrote: > >> Hi Postgres Team, >> >> I have an application using RDS Aurora Postgresql 9.6 version having 4 TB >> of DB size. In this DB we have a table PRODUCT_INFO with around 1 million >> rows and table size of 1 GB. >> We are looking for a implementation where we want to pull the data in >> real time for every 5 seconds from the DB ( Table mentioned above) and send >> it to IOT topic whenever an event occurs for a product. ( event is any new >> product information or change in the existing >> product information.). >> >> It's unclear whether you want to do table scans or if you're just looking for changes to the database. If you're looking just for changes, consider implementing something using logical replication. We have a logical replication system set up to stream changes from the database into an elastic search cluster, and it works great. Mark
Re: Pulling data from Postgres DB table for every 5 seconds.
On Wed, Jan 9, 2019 at 10:10 AM github kran wrote: > Mark - We are currently on 9.6 version of postgres and cant use this > feature of logical replication.Answering to your question we are looking > for any changes in the data related to a specific table ( changes like any > update on a timestamp field > OR any new inserts happened in the last 5 seconds for a specific product > entity). > Any other alternatives ?. > > The feature was added in 9.4 (I think). We are on 9.6 and it works great. Not sure about RDS Aurora, however. Mark
Re: Pulling data from Postgres DB table for every 5 seconds.
On Wed, Jan 9, 2019 at 12:58 PM github kran wrote: > > Mark - just curious to know on the logical replication. Do you think I can > use it for my use case where i need to publish data to a subscriber when > there is a change in the data updated for a row or any new inserts > happening on the table. Intention > is to send this data in Json format by collecting this modified data in > real time to a subscriber. > > From what you've said, it's a great use case for that feature. The one thing to note is that you will have to code up a logical replication client. If I can do it, pretty much anyone can, but it might take some time to get things right. I wrote about some of what I found when developing our client a year ago here: https://wingedpig.com/2017/09/20/streaming-postgres-changes/ We ended up just using the included test output plugin that comes with the postgresql distribution. And we didn't end up streaming to Kafka or anything else first. We just take the data and insert it into our elasticsearch cluster directly as we get it. Mark
Re: Barman disaster recovery solution
On Wed, Feb 27, 2019 at 1:39 AM Achilleas Mantzios < ach...@matrix.gatewaynet.com> wrote: > > Hello, as promised here is my blog : > > https://severalnines.com/blog/current-state-open-source-backup-management-postgresql > > Nice blog post. If you're aiming for a comprehensive run down of tools, I suggest including wal-g. We've been using it since it was released (and its predecessor wal-e for years before that) and it's been great. We currently use it to back up a replica to S3, and it has no issues doing that. In more recent versions, it supports delta backups, which decrease the time/load required for a backup immensely in our case. Cheers, Mark
Non-pausing table scan on 9.6 replica?
Hi All, On a 9.6 streaming replica, we do table scans for stats and other things. During these scans, the replication is paused (the 'recovering' postgres process has 'waiting' appended to it). We're not using transactions with these scans. Is there anything we can do to prevent the pausing? Thanks, Mark
Re: Non-pausing table scan on 9.6 replica?
Thank you for responding to my email. On Tue, Mar 5, 2019 at 9:20 PM Andreas Kretschmer wrote: > > have you set ```max_standby_streaming_delay``? The default is 30 > seconds, which means that this will be the maximum time allowed for a > replication lag caused by a conflicting query. > Yes, we've bumped that up a lot. > You can use ``hot_standby_feedback = on``, but the downside will be more > bloat on the tables. > > I'm not sure I understand. I'm not worried about the query being cancelled on the replica. max_standby_streaming_delay fixes that for us. I'm worried about the streaming replication being paused while this table scan is running. If the table scan takes several minutes, then the replica becomes several minutes out of sync with the master. I'd prefer that not to happen and I'm wondering if there's a way to do that. Thanks, Mark
Re: Non-pausing table scan on 9.6 replica?
Andreas, Sameer, Thank you for replying. I did not understand the purpose of hot_standby_feedback, and your explanations helped. I turned it on, and the pausing stopped. Thanks, Mark
Re: schema change tracking
On Thu, May 16, 2019 at 9:41 AM Benedict Holland < benedict.m.holl...@gmail.com> wrote: > > I need a tool that can track schema changes in a postgesql database, write > scripts to alter the tables, and store those changes in git. Are there > tools that exist that can do this? > > We ended up rolling our own. We do schema dumps and then use https://www.apgdiff.com/ to diff them. For our relatively simple schemas, it's worked fine. One thing to note, apgdiff doesn't support `ALTER TABLE ONLY [a-z\.]+ REPLICA IDENTITY FULL;` lines, which we just remove before diffing. Cheers, Mark
Logical replication blocking alter/drop
Hi All, Postgres 9.6.5. We run several logical replication processes off our main postgres server. What we've noticed is that schema changes seem to block until we halt the logical replication processes. For example, I just did a 'DROP INDEX CONCURRENTLY' command, and it just sat there until I stopped the logical replication processes (I did not have to drop the logical replication slots). Is this expected or are we perhaps doing something wrong? Thanks, Mark