Re: ERROR: new collation (en_US.UTF-8) is incompatible with the collation of the template database (en_US.utf-8)
On 22.06.22 22:18, Tomas Pospisek wrote: On 22.06.22 21:25, Adrian Klaver wrote: On 6/22/22 12:17, Tomas Pospisek wrote: Hi all, while doing `cat pg_dump.dump | psql` I get the above message. Note that `pg_dump.dump` contains: CREATE DATABASE some_db WITH TEMPLATE = my_own_template ENCODING = 'UTF8' LC_COLLATE = 'en_US.UTF-8' LC_CTYPE = 'en_US.UTF-8'; What is exactly the problem? I understand that en_US.UTF-8 and en_US.utf-8 is not *exactly* the same string. However I do not understand how the difference came to be. And I do not know what the "right" way is and how to proceed from here. If I `pg_dump --create` some DB on the new server (13.7-1.pgdg18.04+1) I get: CREATE DATABASE ... ENCODING = 'UTF8' LOCALE = 'en_US.utf-8'; When I do the same on the old server (12.8-1.pgdg20.04+1) I get: CREATE DATABASE ... ENCODING = 'UTF8' LC_COLLATE = 'en_US.UTF-8' LC_CTYPE = 'en_US.UTF-8'; Any hints or help? Are dumping/restoring from one version of Postgres to another? Yes, indeed! If from older to newer then use the new version of pg_dump(13) to dump the older(12) database. Then the 13 version of restore to load the version 13 database. I will. Thanks a lot Adrian! So I used both pg_dump and pg_restore from the newer machine. Result is still the same. So I'll use Tom Lane's suggestion too and fix the 'UTF-8' spelling in the dump file: Tom Lane wrote: > This is probably more about dumping from different operating systems. > The spelling of the locale name is under the control of the OS, > and Postgres doesn't know very much about the semantics of it > (so I think we conservatively assume that any difference in > spelling is significant). > > Best bet might be to edit the dump file to adjust the locale > spellings to match your new system. Many thanks to both Tom & Adrian!!! *t
Differences in Escaped bytea's when creating a plain pg_dump
Hello community, I've some trouble in restoring a plain text pg_dump. Postgres version is 13 x64 running on Windows10x64 installed from EDB package. The database has bytea_output = 'escape' option because of some processing software needs it for historical reasons. Dump command is: pg_dump --file=mydump.sql --format=plain --verbose --encoding=UTF8 --schema=public --host=localhost --username=myuser --inserts dbname We have two tables that have a bytea-row. But when I look at the dumpfile there is a difference between the escaped bytea-string. (note: both INSERT's from the same run of pg_dump in the dumpfile) SET statement_timeout = 0; SET lock_timeout = 0; SET idle_in_transaction_session_timeout = 0; SET client_encoding = 'UTF8'; SET standard_conforming_strings = on; SELECT pg_catalog.set_config('search_path', '', false); SET check_function_bodies = false; SET xmloption = content; SET client_min_messages = warning; SET row_security = off; [snip] INSERT INTO public.oned_figures VALUES (1, 'Figure_Wolle1', 476, -476, 2000, 2400, 2400, '\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000', 500, 0, 'sinus(0|0|0;30;5;0;0,5;0)', '2021-08-31 11:53:22.442801', 0, 1); [snip] INSERT INTO public.profiles VALUES (1, 1, 's', 152, '\\037\\003\\000\\000\\000\\037\\003\\000\\000\\000\\037\\003\\000\\000\\000\\037\\003\\000\\000\\000!\\003\\000\\000\\000!\\003\\000\\000\\000!\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000', '2016-08-25 00:00:00+02'); [snip] When I restore them via pgadmin4 query tool, both INSERTS do work. But when I read them with my c++ written software from file and send them with pqxx-Library as a transaction, the first bytea-string generates a fault 21020. (0x00 is not a valid utf8 sequence). I also checked the read string in c++ debugger, the single backslashes in the one case and the double backslashes in the other case are there. So my questions are: Why do we get one bytea-string with double backslashes (which works) and another one with single backslashes (which actually not works with pqxx-transactions)? Can I convince pg_dump somehow, to generate double backslashes in all bytea-strings? Why does pgadmin understand both formats. pqxx-transaction does not? Thank you for this great database-system. I really like it. Wolfgang -- May the source be with you
Re: Differences in Escaped bytea's when creating a plain pg_dump
Am 23.06.2022 um 17:13 schrieb WR: Hello community, I've some trouble in restoring a plain text pg_dump. Postgres version is 13 x64 running on Windows10x64 installed from EDB package. The database has bytea_output = 'escape' option because of some processing software needs it for historical reasons. Dump command is: pg_dump --file=mydump.sql --format=plain --verbose --encoding=UTF8 --schema=public --host=localhost --username=myuser --inserts dbname We have two tables that have a bytea-row. But when I look at the dumpfile there is a difference between the escaped bytea-string. (note: both INSERT's from the same run of pg_dump in the dumpfile) SET statement_timeout = 0; SET lock_timeout = 0; SET idle_in_transaction_session_timeout = 0; SET client_encoding = 'UTF8'; SET standard_conforming_strings = on; SELECT pg_catalog.set_config('search_path', '', false); SET check_function_bodies = false; SET xmloption = content; SET client_min_messages = warning; SET row_security = off; [snip] INSERT INTO public.oned_figures VALUES (1, 'Figure_Wolle1', 476, -476, 2000, 2400, 2400, '\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000&\001\334\001\334\001&\001\000\000\332\376$\376$\376\332\376\000\000', 500, 0, 'sinus(0|0|0;30;5;0;0,5;0)', '2021-08-31 11:53:22.442801', 0, 1); [snip] INSERT INTO public.profiles VALUES (1, 1, 's', 152, '\\037\\003\\000\\000\\000\\037\\003\\000\\000\\000\\037\\003\\000\\000\\000\\037\\003\\000\\000\\000!\\003\\000\\000\\000!\\003\\000\\000\\000!\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000\\000\\000"\\003\\000', '2016-08-25 00:00:00+02'); [snip] When I restore them via pgadmin4 query tool, both INSERTS do work. But when I read them with my c++ written software from file and send them with pqxx-Library as a transaction, the first bytea-string generates a fault 21020. (0x00 is not a valid utf8 sequence). I also checked the read string in c++ debugger, the single backslashes in the one case and the double backslashes in the other case are there. So my questions are: Why do we get one bytea-string with double backslashes (which works) and another one with single backslashes (which actually not works with pqxx-transactions)? Can I convince pg_dump somehow, to generate double backslashes in all bytea-strings? Why does pgadmin understand both formats. pqxx-transaction does not? Thank you for this great database-system. I really like it. Wolfgang Hello again, I've found one mistake in the data of the second table (public.profiles). They seem to be really "double escaped" somehow. So they are not valid anymore. Now I know pg_dump doesn't make any difference between the two tables. The only valid data is from table (public.oned_figures) with one backslash. That was my fault, sorry. But one question is left. When I read this valid data into a c++ std::string (and I checked that the single backslashes are still there). Why can't I put this SQL-command to a pqxx-transaction and execute it. It looks like the pqxx-transaction unescapes the bytea-string and then it finds the 0x00 bytes, which are not allowed in text-strings but should be in bytea-strings. -- May the source be with you
Re: ERROR: new collation (en_US.UTF-8) is incompatible with the collation of the template database (en_US.utf-8)
On 6/23/22 00:37, Tomas Pospisek wrote: On 22.06.22 22:18, Tomas Pospisek wrote: On 22.06.22 21:25, Adrian Klaver wrote: On 6/22/22 12:17, Tomas Pospisek wrote: So I used both pg_dump and pg_restore from the newer machine. Result is still the same. So I'll use Tom Lane's suggestion too and fix the 'UTF-8' spelling in the dump file: Not sure why that is necessary? Is seems this is low hanging fruit that could dealt with by the equivalent of lower('en_US.UTF-8') = lower('en_US.utf-8'). Tom Lane wrote: > This is probably more about dumping from different operating systems. > The spelling of the locale name is under the control of the OS, > and Postgres doesn't know very much about the semantics of it > (so I think we conservatively assume that any difference in > spelling is significant). > > Best bet might be to edit the dump file to adjust the locale > spellings to match your new system. Many thanks to both Tom & Adrian!!! *t -- Adrian Klaver adrian.kla...@aklaver.com
Re: ERROR: new collation (en_US.UTF-8) is incompatible with the collation of the template database (en_US.utf-8)
On 6/23/22 10:11, Adrian Klaver wrote: On 6/23/22 00:37, Tomas Pospisek wrote: On 22.06.22 22:18, Tomas Pospisek wrote: On 22.06.22 21:25, Adrian Klaver wrote: On 6/22/22 12:17, Tomas Pospisek wrote: So I used both pg_dump and pg_restore from the newer machine. Result is still the same. So I'll use Tom Lane's suggestion too and fix the 'UTF-8' spelling in the dump file: Not sure why that is necessary? Is seems this is low hanging fruit that could dealt with by the equivalent of lower('en_US.UTF-8') = lower('en_US.utf-8'). Well that was clear as mud. My point was that I don't see why the end user should have to do this when it could be handled internally in the pg_restore code. -- Adrian Klaver adrian.kla...@aklaver.com
Re: Differences in Escaped bytea's when creating a plain pg_dump
WR wrote: > But one question is left. > > When I read this valid data into a c++ std::string (and I checked that > the single backslashes are still there). Why can't I put this > SQL-command to a pqxx-transaction and execute it. It looks like the > pqxx-transaction unescapes the bytea-string and then it finds the 0x00 > bytes, which are not allowed in text-strings but should be in bytea-strings. You may check your server logs. They would have the error messages with the queries as received by the server. Note that passing '\000'::bytea with a single backslash can be incorrect if standard_conforming_strings if set to off (not recommended since version 8.1). In that case the backslashes need to be doubled. Example: postgres=# set standard_conforming_strings to off; SET postgres=# set escape_string_warning to off; SET postgres=# select '\000'::bytea; ERROR: invalid byte sequence for encoding "UTF8": 0x00 postgres=# select '\\000'::bytea; bytea --- \x00 (1 row) There might also be a problem with how the string is being fed with the C++ code, but you'd need to show us the code to get feedback on it. Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite