Hi On Mon, Jul 22, 2024 at 1:02 PM Sandeep Thakkar < sandeep.thak...@enterprisedb.com> wrote:
> > > On Mon, Jul 22, 2024 at 5:21 PM Sandeep Thakkar < > sandeep.thak...@enterprisedb.com> wrote: > >> Hi, >> >> EDB's windows installer gets the locales on the system using the >> https://github.com/EnterpriseDB/edb-installers/blob/REL-16/server/scripts/windows/getlocales/getlocales.cpp >> and >> then substitute some patterns ( >> https://github.com/EnterpriseDB/edb-installers/blob/REL-16/server/pgserver.xml.in#L2850) >> I'm not sure why we do that but that is the old code and probably @Dave >> Page <dave.p...@enterprisedb.com> may know but I'm not sure if that >> piece of code is responsible for this change in encoding in this case. >> > It was to work around limitations in the way we could return data from an external program to BitRock InstallBuilder. I forget the precise details as it was something like 15 years ago, but essentially BitRock couldn't read output that contained (certain?) non-alphanumeric characters, so I had to do that crazy encode/decode dance. > >> When I checked the installation log shared by Ertan, I do see that the >> locale passed to initcluster script is the same as returned by the >> getlocales executable. >> >> Executing C:\Windows\System32\cscript //NoLogo "C:\Program >> Files\PostgreSQL\16/installer/server/initcluster.vbs" "NT >> AUTHORITY\NetworkService" "postgres" "****" >> "C:\Users\User1\AppData\Local\Temp/postgresql_installer_cd79fad8b7" >> "C:\Program Files\PostgreSQL\16" "C:\DATA_PG16" 5432 "Turkish,Türkiye" 0 >> >> Apology about the top posting. Please ignore this thread. I've replied to > another thread. > > >> On Mon, Jul 22, 2024 at 6:43 AM Thomas Munro <thomas.mu...@gmail.com> >> wrote: >> >>> On Mon, Jul 22, 2024 at 11:58 AM Ertan Küçükoglu >>> <ertan.kucuko...@gmail.com> wrote: >>> > Thomas Munro <thomas.mu...@gmail.com>, 21 Tem 2024 Paz, 23:27 >>> tarihinde şunu yazdı: >>> >> 2. Some existing database clusters which had been installed with the >>> >> name "Turkish_Turkey.1254" became unstartable when the OS upgrade >>> >> renamed that locale to "Turkish_Türkiye.1254". I'm trying to provide >>> >> a pathway[2] to fix such systems in core PostgreSQL in the next minor >>> >> release. Everyone affected probably already found another way but at >>> >> least next time a country is renamed this might help with the next >>> >> point too. >>> > >>> > I was also hit by that OS update. >>> > There is a Microsoft tool for creating a locale installer >>> > https://www.microsoft.com/en-us/download/details.aspx?id=41158 >>> > Using that tool and adding a second locale Turkish_Turkey.1254 (name >>> before Microsoft update) in the OS can fix your broken PostgreSQL. >>> > I believe most people simply choose this path. >>> > There are also several blogs/articles written in Turkish about the >>> problem. >>> >>> If that's easy and good enough then maybe I should abandon that >>> on-the-fly renaming patch and we should just do a little documentation >>> note... >>> >>> >> 3. I'd also like to teach initdb to use BCP47 names like "tr-TR" >>> >> instead of those names by default (ie if you don't specify a locale >>> >> name explicitly), and have proposed that before[3] but it hasn't gone >>> >> in due to lack of testing/reviews from Windows users. It seems like >>> >> that doesn't matter much in practice to all the people using the >>> >> popular EDB installer, since it apparently takes control of picking >>> >> the locale and explicitly passes it in (and screws up the encoding as >>> >> we have now learned). >>> > >>> > If I am not mistaken BCP47 names are already used in Linux systems. >>> > Using them would make PostgreSQL use the same locale names across >>> Linux and Windows systems. >>> >>> Not exactly. POSIX systems use >>> [language[_territory][.codeset][@modifier]], but POSIX doesn't say >>> what any of those components are[1] (are they ISO country codes? >>> English words? Hieroglyphs?), so, curiously, those Windows names like >>> "English_United States.1252" are probably POSIX-conforming. Every >>> real POSIX system of course uses ISO language and country codes these >>> days (though I still recall other names being used years ago), so they >>> look similar to the simpler kinds of BCP47 tags, which are just >>> language-country with the same ISO codes but a different separator. >>> They diverge further once you get into the finer points with more >>> components. Incidentally that lack of standardisation is the reason >>> you can't say that the glibc ".utf8" ending is "wrong", even though it >>> is obviously stupid :-p (all systems I know accept .UTF-8, 'cause >>> that's what Ken Thompson, Rob Pike and the Unicode standard called >>> it). I suspect that Windows accepts the POSIX style en_US too, but >>> it's not what the manual tells you to use. >>> >>> But really we shouldn't have to know or care how locales are named; we >>> should get the names from the OS in the first place, and then we >>> should remember them and give them back to the OS at the right times. >>> The two problems here is that Windows has two kinds, one unstable over >>> time and with illegal (for us) characters in the name, and one stable; >>> we need to find all the places where the old unstable ones can get >>> into our system, and block them off. I'm aware of two places now: the >>> EDB installer, and initdb's default for people who run it on the >>> command line with giving an explicit name. >>> >>> > I can help with the testing part. Let me know the details, please. >>> >>> Thanks! I will rebase that patch, and CC you on the thread. >>> >>> [1] >>> https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html >>> >> >> >> -- >> Sandeep Thakkar >> >> >> > > -- > Sandeep Thakkar > > > -- Dave Page VP, Chief Architect, Database Infrastructure EDB: https://www.enterprisedb.com