Invalid byte sequence errors on DB restore

2020-03-15 Thread Samuel Smith
My current DB backup routine is just to call pg_dump and pipe to gzip. 
We recently started to get a failure to restore (which is basically just 
using psql -f on the pg_dump file) with the following errors:


invalid byte sequence for encoding "UTF8": 0xa0
 and
invalid byte sequence for encoding "UTF8": 0xd7 0x20


This is on a pg 9.2.24 instance. Any tips to troubleshoot?

Regards,
Samuel Smith




Re: Invalid byte sequence errors on DB restore

2020-03-16 Thread Samuel Smith

On 3/16/20 2:49 AM, Laurenz Albe wrote:

On Sun, 2020-03-15 at 23:18 -0500, Samuel Smith wrote:

My current DB backup routine is just to call pg_dump and pipe to gzip.
We recently started to get a failure to restore (which is basically just
using psql -f on the pg_dump file) with the following errors:

invalid byte sequence for encoding "UTF8": 0xa0
   and
invalid byte sequence for encoding "UTF8": 0xd7 0x20


This is on a pg 9.2.24 instance. Any tips to troubleshoot?


Simple.  Fix the offending string and upgrade.

Failure to enforce correct encoding is a bug in PostgreSQL, and a
number of such bugs have been fixed over the decades, so you might
be happier with a less ancient version.

Yours,
Laurenz Albe





Yes I thought I had seen some info on bug fixes in newer versions. I 
will pass this on to the system admins/stakeholders.


Thanks,
Samuel Smith




Re: Invalid byte sequence errors on DB restore

2020-03-16 Thread Samuel Smith

On 3/16/20 9:33 AM, Adrian Klaver wrote:

On 3/15/20 9:18 PM, Samuel Smith wrote:
My current DB backup routine is just to call pg_dump and pipe to gzip. 
We recently started to get a failure to restore (which is basically 
just using psql -f on the pg_dump file) with the following errors:


invalid byte sequence for encoding "UTF8": 0xa0
  and
invalid byte sequence for encoding "UTF8": 0xd7 0x20


This is on a pg 9.2.24 instance. Any tips to troubleshoot?


What are the locale and encodings set to for the instance and databases 
in it?




Regards,
Samuel Smith








The server is in UTF8. The file made with pg_dump used 'SQL_ASCII', but 
setting it to UTF8 (via SET client_encoding ) did not help either. 
Having the pg_dump encoding set to 'latin1' seems to allow the file 
created it by it to be loaded via psql -f and everything seems to work. 
Is there any bad side to setting the encoding on pg_dump to latin1?



For the record, the problem characters are:
https://www.htmlsymbols.xyz/unicode/U+00D7
and
https://www.htmlsymbols.xyz/unicode/U+00A0

But those characters were in many places and not all were issues. They 
only fail depending on the characters that precede it which makes it 
complicated.


Thanks,
Samuel Smith





Re: Invalid byte sequence errors on DB restore

2020-03-19 Thread Samuel Smith

On 3/15/20 11:18 PM, Samuel Smith wrote:
My current DB backup routine is just to call pg_dump and pipe to gzip. 
We recently started to get a failure to restore (which is basically just 
using psql -f on the pg_dump file) with the following errors:


invalid byte sequence for encoding "UTF8": 0xa0
  and
invalid byte sequence for encoding "UTF8": 0xd7 0x20


This is on a pg 9.2.24 instance. Any tips to troubleshoot?

Regards,
Samuel Smith




Our issue actually turned out to be that a couple of our production 
database had the encoding set to SQL_ASCII while all of our development 
servers had UTF-8. This meant in some cases where we would restore a 
production database into development (for testing or bug hunting), there 
would be a failure to parse the backup file. A similar issue to this 
blog post: 
https://www.endpoint.com/blog/2017/07/21/postgres-migrating-sqlascii-to-utf-8


So our fix was to dump the affected production databases using the 
LATIN1 encoding for pg_dump followed by destroying and recreating the 
database and setting its new encoding to UTF-8. Then we could restore 
the data using the pg_dump file with LATIN1 encoding.


Regards,
Samuel Smith




Monitoring for long running transactions

2020-06-04 Thread Samuel Smith
We had a customer complaining of random data loss for the last 6 months 
or so. We eventually tracked it down to a combination of bad coding and 
a couple of bugs with the ORM. Basically, 'BEGIN' was being emitted by 
the web app when viewing a certain page and 'COMMIT' was never emitted 
after that. So once the app would get restarted, all data changes would 
be lost. Definitely worst case scenario.


So the question is, what is the best way to monitor for this scenario 
going forward? Are there any plugins or community recommended scripts 
already made?


Regards,






Re: Monitoring for long running transactions

2020-06-04 Thread Samuel Smith

On 6/4/20 2:29 PM, Adrian Klaver wrote:

On 6/4/20 10:00 AM, Samuel Smith wrote:
We had a customer complaining of random data loss for the last 6 
months or so. We eventually tracked it down to a combination of bad 
coding and a couple of bugs with the ORM. Basically, 'BEGIN' was being 
emitted by the web app when viewing a certain page and 'COMMIT' was 
never emitted after that. So once the app would get restarted, all 
data changes would be lost. Definitely worst case scenario.


So the question is, what is the best way to monitor for this scenario 
going forward? Are there any plugins or community recommended scripts 
already made?


https://www.postgresql.org/docs/12/monitoring-stats.html#PG-STAT-ACTIVITY-VIEW 



"
state text Current overall state of this backend. Possible 
values are:


     active: The backend is executing a query.

     idle: The backend is waiting for a new client command.

     idle in transaction: The backend is in a transaction, but is not 
currently executing a query.


     idle in transaction (aborted): This state is similar to idle in 
transaction, except one of the statements in the transaction caused an 
error.


     fastpath function call: The backend is executing a fast-path function.

     disabled: This state is reported if track_activities is disabled in 
this backend.

"



Regards,










Sorry, I should have clarified that I was aware of the pg_stat_activity 
table. That is how we found the problem in the first place. And yes I 
could just write a bash script and run it in cron. I just didn't know if 
there was a more "official" way to go about this since it is probably a 
common monitoring point and/or if something like this was already made.


Regards,