On Thu, Aug 8, 2024 at 6:10 AM Thomas Munro <thomas.mu...@gmail.com> wrote:
> Thanks. The log didn't offer any more clues, and my colleague David R > has Windows and knows how to work its debugger so we sat down together > and chased this down (thanks David!). > > 1. It is indeed calling abort(), but it's not a PANIC or Assert() in > PostgreSQL, it's an assertion inside Windows' own setlocale(): > > minkernel\crts\ucrt\src\appcrt\convert\mbstowcs.cpp(245) : Assertion > failed: (pwcs == nullptr && sizeInWords == 0) || (pwcs != nullptr && > sizeInWords > 0) > > 2. It is indeed confused about the encoding of the string > "Turkish_Türkiye.1254" itself, and works fine if you use "tr-TR". > > 3. It doesn't happen on 15, because 16 added a key ingredient: > > commit bf03cfd162176d543da79f9398131abc251ddbb9 > Author: Peter Eisentraut <pe...@eisentraut.org> > Date: Tue Jan 3 14:21:40 2023 +0100 > > Windows support in pg_import_system_collations > > That causes it to spin through a bunch of system locales and switch to > them, and the first one is "aa". After it calls: > > setlocale(2, "aa"); > > ... then the next call to restore the previous locale is something like: > > setlocale(2, "Turkish_T\252rkiye.1254"); > > (That \252 == 0xfc probably depends on your system's default > encoding.) It doesn't like that name anymore, and aborts. A minimal > program with just those two lines shows that. > > It appears that after switching to "aa", it interprets the string > passed to the next call to setlocale() as some other encoding > (probably UTF-8, I dunno). I don't know why it doesn't fail and > return NULL, but there is a more general point that it's a bit bonkers > to use non-ASCII byte sequences in the library calls that are used to > control how non-ASCII byte sequences are interpreted. Maybe it can be > done if you're careful, but in particular a naive save-and-restore > sequence just won't work. > > I guess a save-and-restore done with wsetlocale() could fix that. But > I decline to work on that, we need less Windows kludgery in the tree, > not more. I think a better answer is "don't do that". > > Really, we *have* to chase all these non-BCP-47 locales out of the > installer (I hope you can work on that?), yeah, It seems getlocales.cpp needs to be changed to achieve it. I'll look into it out of PostgreSQL (testers > wanted[1]), and out of the world's existing clusters (maybe with > Dave's pg_upgrade idea, someone would need to write a patch, or maybe > someone could write a stand-alone locale migration program that just > connects to a cluster and (using some authoritative source, that's the > key bit to research) and replaces bad old names with nice new ones). > > [1] > https://www.postgresql.org/message-id/flat/CA+hUKGJ=XThErgAQRoqfCy1bKPxXVuF0=2zdbb+sxds59pv...@mail.gmail.com > -- Sandeep Thakkar