Hi

On Mon, Jul 22, 2024 at 1:02 PM Sandeep Thakkar <
sandeep.thak...@enterprisedb.com> wrote:

>
>
> On Mon, Jul 22, 2024 at 5:21 PM Sandeep Thakkar <
> sandeep.thak...@enterprisedb.com> wrote:
>
>> Hi,
>>
>> EDB's windows installer gets the locales on the system using the
>> https://github.com/EnterpriseDB/edb-installers/blob/REL-16/server/scripts/windows/getlocales/getlocales.cpp
>>  and
>> then substitute some patterns (
>> https://github.com/EnterpriseDB/edb-installers/blob/REL-16/server/pgserver.xml.in#L2850)
>> I'm not sure why we do that but that is the old code and probably @Dave
>> Page <dave.p...@enterprisedb.com>  may know but I'm not sure if that
>> piece of code is responsible for this change in encoding in this case.
>>
>
It was to work around limitations in the way we could return data from an
external program to BitRock InstallBuilder. I forget the precise details as
it was something like 15 years ago, but essentially BitRock couldn't read
output that contained (certain?) non-alphanumeric characters, so I had to
do that crazy encode/decode dance.


>
>> When I checked the installation log shared by Ertan, I do see that the
>> locale passed to initcluster script is the same as returned by the
>> getlocales executable.
>>
>> Executing C:\Windows\System32\cscript //NoLogo "C:\Program
>> Files\PostgreSQL\16/installer/server/initcluster.vbs" "NT
>> AUTHORITY\NetworkService" "postgres" "****"
>> "C:\Users\User1\AppData\Local\Temp/postgresql_installer_cd79fad8b7"
>> "C:\Program Files\PostgreSQL\16" "C:\DATA_PG16" 5432 "Turkish,Türkiye" 0
>>
>> Apology about the top posting. Please ignore this thread. I've replied to
> another thread.
>
>
>> On Mon, Jul 22, 2024 at 6:43 AM Thomas Munro <thomas.mu...@gmail.com>
>> wrote:
>>
>>> On Mon, Jul 22, 2024 at 11:58 AM Ertan Küçükoglu
>>> <ertan.kucuko...@gmail.com> wrote:
>>> > Thomas Munro <thomas.mu...@gmail.com>, 21 Tem 2024 Paz, 23:27
>>> tarihinde şunu yazdı:
>>> >> 2.  Some existing database clusters which had been installed with the
>>> >> name "Turkish_Turkey.1254" became unstartable when the OS upgrade
>>> >> renamed that locale to "Turkish_Türkiye.1254".  I'm trying to provide
>>> >> a pathway[2] to fix such systems in core PostgreSQL in the next minor
>>> >> release.  Everyone affected probably already found another way but at
>>> >> least next time a country is renamed this might help with the next
>>> >> point too.
>>> >
>>> > I was also hit by that OS update.
>>> > There is a Microsoft tool for creating a locale installer
>>> > https://www.microsoft.com/en-us/download/details.aspx?id=41158
>>> > Using that tool and adding a second locale Turkish_Turkey.1254 (name
>>> before Microsoft update) in the OS can fix your broken PostgreSQL.
>>> > I believe most people simply choose this path.
>>> > There are also several blogs/articles written in Turkish about the
>>> problem.
>>>
>>> If that's easy and good enough then maybe I should abandon that
>>> on-the-fly renaming patch and we should just do a little documentation
>>> note...
>>>
>>> >> 3.  I'd also like to teach initdb to use BCP47 names like "tr-TR"
>>> >> instead of those names by default (ie if you don't specify a locale
>>> >> name explicitly), and have proposed that before[3] but it hasn't gone
>>> >> in due to lack of testing/reviews from Windows users.  It seems like
>>> >> that doesn't matter much in practice to all the people using the
>>> >> popular EDB installer, since it apparently takes control of picking
>>> >> the locale and explicitly passes it in (and screws up the encoding as
>>> >> we have now learned).
>>> >
>>> > If I am not mistaken BCP47 names are already used in Linux systems.
>>> > Using them would make PostgreSQL use the same locale names across
>>> Linux and Windows systems.
>>>
>>> Not exactly.  POSIX systems use
>>> [language[_territory][.codeset][@modifier]], but POSIX doesn't say
>>> what any of those components are[1] (are they ISO country codes?
>>> English words?  Hieroglyphs?), so, curiously, those Windows names like
>>> "English_United States.1252" are probably POSIX-conforming.  Every
>>> real POSIX system of course uses ISO language and country codes these
>>> days (though I still recall other names being used years ago), so they
>>> look similar to the simpler kinds of BCP47 tags, which are just
>>> language-country with the same ISO codes but a different separator.
>>> They diverge further once you get into the finer points with more
>>> components.  Incidentally that lack of standardisation is the reason
>>> you can't say that the glibc ".utf8" ending is "wrong", even though it
>>> is obviously stupid :-p (all systems I know accept .UTF-8, 'cause
>>> that's what Ken Thompson, Rob Pike and the Unicode standard called
>>> it).  I suspect that Windows accepts the POSIX style en_US too, but
>>> it's not what the manual tells you to use.
>>>
>>> But really we shouldn't have to know or care how locales are named; we
>>> should get the names from the OS in the first place, and then we
>>> should remember them and give them back to the OS at the right times.
>>> The two problems here is that Windows has two kinds, one unstable over
>>> time and with illegal (for us) characters in the name, and one stable;
>>> we need to find all the places where the old unstable ones can get
>>> into our system, and block them off.  I'm aware of two places now: the
>>> EDB installer, and initdb's default for people who run it on the
>>> command line with giving an explicit name.
>>>
>>> > I can help with the testing part. Let me know the details, please.
>>>
>>> Thanks!  I will rebase that patch, and CC you on the thread.
>>>
>>> [1]
>>> https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html
>>>
>>
>>
>> --
>> Sandeep Thakkar
>>
>>
>>
>
> --
> Sandeep Thakkar
>
>
>

-- 
Dave Page
VP, Chief Architect, Database Infrastructure
EDB: https://www.enterprisedb.com

Reply via email to