Since ~2019, Windows has option "Beta: Use Unicode UTF-8 for worldwide language support". That option breaks the appendShellString() assumption that it can escape every byte except '\0', '\r'. and '\n'. Instead, process creation injects U+FFFD REPLACEMENT CHARACTER (UTF-8: ef bf bd) for each byte of the command line not forming valid UTF-8. Here's the Windows Server 2025 output from a test program that sends bytes 0x80..0xFF in a CreateProcessA() command line:
argv[1] = 58 ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd GetCommandLineA() = 61 20 58 ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd GetCommandLineW() = 61 20 58 fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd fffd For PostgreSQL, I expect the most obvious problems will arise for rolname and datname containing non-UTF8. For example, pg_dumpall relies on appendShellString() to call pg_dump for arbitrary datname. pg_dumpall would get "database ... does not exist". Some ways we might react: 1. Instead of arbitrary bytes in argv[], use a temporary PGSERVICEFILE. For other kinds of appendShellString() input (mostly file paths), we could provide other ways to pass them outside argv, or we could just not support the full character repertoire in those. Windows "8.3 filenames" are a fair workaround. 2. Just fail if the system option is enabled and we would appendShellString() a non-UTF8 value. 3. Fail if we find U+FFFD in arguments. It's valid Unicode, though. I plan not to work on this myself, and I'm not advocating this as a priority to anyone else. I'm just sending this to record what I learned, in case it helps someone for whom it does become a priority. https://stackoverflow.com/a/57134096/16371536 shows how to enable the option. I'd be interested to hear test results with that enabled. My hypothesis is that 010_dump_connstr.pl and 200_connstr.pl would fail. (My Windows development environments are all too old, and I stopped short of building a new one for this.) It should also be possible to test this in CI by building an image with the following https://github.com/anarazel/pg-vm-images.git modification: --- a/scripts/windows_install_dbg.ps1 +++ b/scripts/windows_install_dbg.ps1 @@ -9,6 +9,15 @@ mkdir c:\t cd c:\t +echo "enabling UTF8" +Set-ItemProperty -Path 'HKLM:\SYSTEM\CurrentControlSet\Control\Nls\CodePage' ` + -Name 'ACP' -Value '65001' +Set-ItemProperty -Path 'HKLM:\SYSTEM\CurrentControlSet\Control\Nls\CodePage' ` + -Name 'OEMCP' -Value '65001' +Set-ItemProperty -Path 'HKLM:\SYSTEM\CurrentControlSet\Control\Nls\CodePage' ` + -Name 'MACCP' -Value '65001' + + echo "configuring windows error reporting" # prevent windows error handling dialog from causing hangs