Invalid byte sequence errors on DB restore
My current DB backup routine is just to call pg_dump and pipe to gzip. We recently started to get a failure to restore (which is basically just using psql -f on the pg_dump file) with the following errors: invalid byte sequence for encoding "UTF8": 0xa0 and invalid byte sequence for encoding "UTF8": 0xd7 0x20 This is on a pg 9.2.24 instance. Any tips to troubleshoot? Regards, Samuel Smith
Re: Invalid byte sequence errors on DB restore
On 3/16/20 2:49 AM, Laurenz Albe wrote: On Sun, 2020-03-15 at 23:18 -0500, Samuel Smith wrote: My current DB backup routine is just to call pg_dump and pipe to gzip. We recently started to get a failure to restore (which is basically just using psql -f on the pg_dump file) with the following errors: invalid byte sequence for encoding "UTF8": 0xa0 and invalid byte sequence for encoding "UTF8": 0xd7 0x20 This is on a pg 9.2.24 instance. Any tips to troubleshoot? Simple. Fix the offending string and upgrade. Failure to enforce correct encoding is a bug in PostgreSQL, and a number of such bugs have been fixed over the decades, so you might be happier with a less ancient version. Yours, Laurenz Albe Yes I thought I had seen some info on bug fixes in newer versions. I will pass this on to the system admins/stakeholders. Thanks, Samuel Smith
Re: Invalid byte sequence errors on DB restore
On 3/16/20 9:33 AM, Adrian Klaver wrote: On 3/15/20 9:18 PM, Samuel Smith wrote: My current DB backup routine is just to call pg_dump and pipe to gzip. We recently started to get a failure to restore (which is basically just using psql -f on the pg_dump file) with the following errors: invalid byte sequence for encoding "UTF8": 0xa0 and invalid byte sequence for encoding "UTF8": 0xd7 0x20 This is on a pg 9.2.24 instance. Any tips to troubleshoot? What are the locale and encodings set to for the instance and databases in it? Regards, Samuel Smith The server is in UTF8. The file made with pg_dump used 'SQL_ASCII', but setting it to UTF8 (via SET client_encoding ) did not help either. Having the pg_dump encoding set to 'latin1' seems to allow the file created it by it to be loaded via psql -f and everything seems to work. Is there any bad side to setting the encoding on pg_dump to latin1? For the record, the problem characters are: https://www.htmlsymbols.xyz/unicode/U+00D7 and https://www.htmlsymbols.xyz/unicode/U+00A0 But those characters were in many places and not all were issues. They only fail depending on the characters that precede it which makes it complicated. Thanks, Samuel Smith
Re: Invalid byte sequence errors on DB restore
On 3/15/20 11:18 PM, Samuel Smith wrote: My current DB backup routine is just to call pg_dump and pipe to gzip. We recently started to get a failure to restore (which is basically just using psql -f on the pg_dump file) with the following errors: invalid byte sequence for encoding "UTF8": 0xa0 and invalid byte sequence for encoding "UTF8": 0xd7 0x20 This is on a pg 9.2.24 instance. Any tips to troubleshoot? Regards, Samuel Smith Our issue actually turned out to be that a couple of our production database had the encoding set to SQL_ASCII while all of our development servers had UTF-8. This meant in some cases where we would restore a production database into development (for testing or bug hunting), there would be a failure to parse the backup file. A similar issue to this blog post: https://www.endpoint.com/blog/2017/07/21/postgres-migrating-sqlascii-to-utf-8 So our fix was to dump the affected production databases using the LATIN1 encoding for pg_dump followed by destroying and recreating the database and setting its new encoding to UTF-8. Then we could restore the data using the pg_dump file with LATIN1 encoding. Regards, Samuel Smith
Monitoring for long running transactions
We had a customer complaining of random data loss for the last 6 months or so. We eventually tracked it down to a combination of bad coding and a couple of bugs with the ORM. Basically, 'BEGIN' was being emitted by the web app when viewing a certain page and 'COMMIT' was never emitted after that. So once the app would get restarted, all data changes would be lost. Definitely worst case scenario. So the question is, what is the best way to monitor for this scenario going forward? Are there any plugins or community recommended scripts already made? Regards,
Re: Monitoring for long running transactions
On 6/4/20 2:29 PM, Adrian Klaver wrote: On 6/4/20 10:00 AM, Samuel Smith wrote: We had a customer complaining of random data loss for the last 6 months or so. We eventually tracked it down to a combination of bad coding and a couple of bugs with the ORM. Basically, 'BEGIN' was being emitted by the web app when viewing a certain page and 'COMMIT' was never emitted after that. So once the app would get restarted, all data changes would be lost. Definitely worst case scenario. So the question is, what is the best way to monitor for this scenario going forward? Are there any plugins or community recommended scripts already made? https://www.postgresql.org/docs/12/monitoring-stats.html#PG-STAT-ACTIVITY-VIEW " state text Current overall state of this backend. Possible values are: active: The backend is executing a query. idle: The backend is waiting for a new client command. idle in transaction: The backend is in a transaction, but is not currently executing a query. idle in transaction (aborted): This state is similar to idle in transaction, except one of the statements in the transaction caused an error. fastpath function call: The backend is executing a fast-path function. disabled: This state is reported if track_activities is disabled in this backend. " Regards, Sorry, I should have clarified that I was aware of the pg_stat_activity table. That is how we found the problem in the first place. And yes I could just write a bash script and run it in cron. I just didn't know if there was a more "official" way to go about this since it is probably a common monitoring point and/or if something like this was already made. Regards,