Re: ERROR: new collation (en_US.UTF-8) is incompatible with the collation of the template database (en_US.utf-8)

2022-06-23 Thread Tomas Pospisek

On 22.06.22 22:18, Tomas Pospisek wrote:

On 22.06.22 21:25, Adrian Klaver wrote:

On 6/22/22 12:17, Tomas Pospisek wrote:

Hi all,

while doing `cat pg_dump.dump | psql` I get the above message. Note 
that `pg_dump.dump` contains:


 CREATE DATABASE some_db WITH TEMPLATE = my_own_template ENCODING 
= 'UTF8' LC_COLLATE = 'en_US.UTF-8' LC_CTYPE = 'en_US.UTF-8';


What is exactly the problem? I understand that en_US.UTF-8 and 
en_US.utf-8 is not *exactly* the same string.


However I do not understand how the difference came to be. And I do 
not know what the "right" way is and how to proceed from here.


If I `pg_dump --create` some DB on the new server 
(13.7-1.pgdg18.04+1) I get:


 CREATE DATABASE ... ENCODING = 'UTF8' LOCALE = 'en_US.utf-8';

When I do the same on the old server (12.8-1.pgdg20.04+1) I get:

 CREATE DATABASE ... ENCODING = 'UTF8' LC_COLLATE = 'en_US.UTF-8' 
LC_CTYPE = 'en_US.UTF-8';


Any hints or help?


Are dumping/restoring from one version of Postgres to another?


Yes, indeed!

If from older to newer then use the new version of pg_dump(13) to dump 
the older(12) database. Then the 13 version of restore to load the 
version 13 database.


I will. Thanks a lot Adrian!


So I used both pg_dump and pg_restore from the newer machine. Result is 
still the same.  So I'll use Tom Lane's suggestion too and fix the 
'UTF-8' spelling in the dump file:


Tom Lane wrote:

> This is probably more about dumping from different operating systems.
> The spelling of the locale name is under the control of the OS,
> and Postgres doesn't know very much about the semantics of it
> (so I think we conservatively assume that any difference in
> spelling is significant).
>
> Best bet might be to edit the dump file to adjust the locale
> spellings to match your new system.

Many thanks to both Tom & Adrian!!!
*t




Differences in Escaped bytea's when creating a plain pg_dump

2022-06-23 Thread WR

Hello community,

I've some trouble in restoring a plain text pg_dump.
Postgres version is 13 x64 running on Windows10x64 installed from EDB 
package.


The database has bytea_output = 'escape' option because of some 
processing software needs it for historical reasons.


Dump command is:

pg_dump --file=mydump.sql --format=plain --verbose --encoding=UTF8 
--schema=public --host=localhost --username=myuser --inserts dbname


We have two tables that have a bytea-row.

But when I look at the dumpfile there is a difference between the 
escaped bytea-string. (note: both INSERT's from the same run of pg_dump 
in the dumpfile)



SET statement_timeout = 0;
SET lock_timeout = 0;
SET idle_in_transaction_session_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SELECT pg_catalog.set_config('search_path', '', false);
SET check_function_bodies = false;
SET xmloption = content;
SET client_min_messages = warning;
SET row_security = off;

[snip]

INSERT INTO public.oned_figures VALUES (1, 'Figure_Wolle1', 476, -476, 
2000, 2400, 2400, 
'\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000', 
500, 0, 'sinus(0|0|0;30;5;0;0,5;0)', '2021-08-31 11:53:22.442801', 0, 1);


[snip]

INSERT INTO public.profiles VALUES (1, 1, 's', 152, 
'\\037\\003\\000\\000\\000\\037\\003\\000\\000\\000\\037\\003\\000\\000\\000\\037\\003\\000\\000\\000!\\003\\000\\000\\000!\\003\\000\\000\\000!\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000', 
'2016-08-25 00:00:00+02');


[snip]

When I restore them via pgadmin4 query tool, both INSERTS do work.

But when I read them with my c++ written software from file and send 
them with pqxx-Library as a transaction, the first bytea-string 
generates a fault 21020. (0x00 is not a valid utf8 sequence). I also 
checked the read string in c++ debugger, the single backslashes in the 
one case and the double backslashes in the other case are there.


So my questions are:
Why do we get one bytea-string with double backslashes (which works) and 
another one with single backslashes (which actually not works with 
pqxx-transactions)?


Can I convince pg_dump somehow, to generate double backslashes in all 
bytea-strings?


Why does pgadmin understand both formats. pqxx-transaction does not?


Thank you for this great database-system. I really like it.
Wolfgang



--
May the source be with you




Re: Differences in Escaped bytea's when creating a plain pg_dump

2022-06-23 Thread WR

Am 23.06.2022 um 17:13 schrieb WR:

Hello community,

I've some trouble in restoring a plain text pg_dump.
Postgres version is 13 x64 running on Windows10x64 installed from EDB 
package.


The database has bytea_output = 'escape' option because of some 
processing software needs it for historical reasons.


Dump command is:

pg_dump --file=mydump.sql --format=plain --verbose --encoding=UTF8 
--schema=public --host=localhost --username=myuser --inserts dbname


We have two tables that have a bytea-row.

But when I look at the dumpfile there is a difference between the 
escaped bytea-string. (note: both INSERT's from the same run of pg_dump 
in the dumpfile)



SET statement_timeout = 0;
SET lock_timeout = 0;
SET idle_in_transaction_session_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SELECT pg_catalog.set_config('search_path', '', false);
SET check_function_bodies = false;
SET xmloption = content;
SET client_min_messages = warning;
SET row_security = off;

[snip]

INSERT INTO public.oned_figures VALUES (1, 'Figure_Wolle1', 476, -476, 
2000, 2400, 2400, 
'\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000', 
500, 0, 'sinus(0|0|0;30;5;0;0,5;0)', '2021-08-31 11:53:22.442801', 0, 1);


[snip]

INSERT INTO public.profiles VALUES (1, 1, 's', 152, 
'\\037\\003\\000\\000\\000\\037\\003\\000\\000\\000\\037\\003\\000\\000\\000\\037\\003\\000\\000\\000!\\003\\000\\000\\000!\\003\\000\\000\\000!\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000', 
'2016-08-25 00:00:00+02');


[snip]

When I restore them via pgadmin4 query tool, both INSERTS do work.

But when I read them with my c++ written software from file and send 
them with pqxx-Library as a transaction, the first bytea-string 
generates a fault 21020. (0x00 is not a valid utf8 sequence). I also 
checked the read string in c++ debugger, the single backslashes in the 
one case and the double backslashes in the other case are there.


So my questions are:
Why do we get one bytea-string with double backslashes (which works) and 
another one with single backslashes (which actually not works with 
pqxx-transactions)?


Can I convince pg_dump somehow, to generate double backslashes in all 
bytea-strings?


Why does pgadmin understand both formats. pqxx-transaction does not?


Thank you for this great database-system. I really like it.
Wolfgang





Hello again,

I've found one mistake in the data of the second table 
(public.profiles). They seem to be really "double escaped" somehow. So 
they are not valid anymore.


Now I know pg_dump doesn't make any difference between the two tables. 
The only valid data is from table (public.oned_figures) with one 
backslash. That was my fault, sorry.


But one question is left.

When I read this valid data into a c++ std::string (and I checked that 
the single backslashes are still there). Why can't I put this 
SQL-command to a pqxx-transaction and execute it. It looks like the 
pqxx-transaction unescapes the bytea-string and then it finds the 0x00 
bytes, which are not allowed in text-strings but should be in bytea-strings.




--
May the source be with you




Re: ERROR: new collation (en_US.UTF-8) is incompatible with the collation of the template database (en_US.utf-8)

2022-06-23 Thread Adrian Klaver

On 6/23/22 00:37, Tomas Pospisek wrote:

On 22.06.22 22:18, Tomas Pospisek wrote:

On 22.06.22 21:25, Adrian Klaver wrote:

On 6/22/22 12:17, Tomas Pospisek wrote:




So I used both pg_dump and pg_restore from the newer machine. Result is 
still the same.  So I'll use Tom Lane's suggestion too and fix the 
'UTF-8' spelling in the dump file:


Not sure why that is necessary? Is seems this is low hanging fruit that 
could dealt with by the equivalent of lower('en_US.UTF-8') = 
lower('en_US.utf-8').




Tom Lane wrote:

 > This is probably more about dumping from different operating systems.
 > The spelling of the locale name is under the control of the OS,
 > and Postgres doesn't know very much about the semantics of it
 > (so I think we conservatively assume that any difference in
 > spelling is significant).
 >
 > Best bet might be to edit the dump file to adjust the locale
 > spellings to match your new system.

Many thanks to both Tom & Adrian!!!
*t





--
Adrian Klaver
adrian.kla...@aklaver.com




Re: ERROR: new collation (en_US.UTF-8) is incompatible with the collation of the template database (en_US.utf-8)

2022-06-23 Thread Adrian Klaver

On 6/23/22 10:11, Adrian Klaver wrote:

On 6/23/22 00:37, Tomas Pospisek wrote:

On 22.06.22 22:18, Tomas Pospisek wrote:

On 22.06.22 21:25, Adrian Klaver wrote:

On 6/22/22 12:17, Tomas Pospisek wrote:




So I used both pg_dump and pg_restore from the newer machine. Result 
is still the same.  So I'll use Tom Lane's suggestion too and fix the 
'UTF-8' spelling in the dump file:


Not sure why that is necessary? Is seems this is low hanging fruit that 
could dealt with by the equivalent of lower('en_US.UTF-8') = 
lower('en_US.utf-8').


Well that was clear as mud. My point was that I don't see why the end 
user should have to do this when it could be handled internally in the 
pg_restore code.







--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Differences in Escaped bytea's when creating a plain pg_dump

2022-06-23 Thread Daniel Verite
WR wrote:

> But one question is left.
> 
> When I read this valid data into a c++ std::string (and I checked that 
> the single backslashes are still there). Why can't I put this 
> SQL-command to a pqxx-transaction and execute it. It looks like the 
> pqxx-transaction unescapes the bytea-string and then it finds the 0x00 
> bytes, which are not allowed in text-strings but should be in bytea-strings.

You may check your server logs. They would have the error messages
with the queries as received by the server.

Note that passing '\000'::bytea with a single backslash can be
incorrect if standard_conforming_strings if set to off (not
recommended since version 8.1).
In that case the backslashes need to be doubled.

Example:

postgres=# set standard_conforming_strings to off;
SET

postgres=# set escape_string_warning to off;
SET

postgres=# select '\000'::bytea;
ERROR:  invalid byte sequence for encoding "UTF8": 0x00

postgres=# select '\\000'::bytea;
 bytea 
---
 \x00
(1 row)


There might also be a problem with how the string is being fed with
the C++ code, but you'd need to show us the code to get feedback
on it.


Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite