Re: [PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"
Hi Corinna, On Thu, 3 Sep 2020 19:59:12 +0200 Corinna Vinschen wrote: > On Sep 2 18:38, Corinna Vinschen wrote: > > Hi Takashi, > > > > On Sep 3 01:25, Takashi Yano via Cygwin-patches wrote: > > > Hi Corinna, > > > > > > On Wed, 2 Sep 2020 17:24:50 +0200 > > > Corinna Vinschen wrote: > > > > > > get_locale_from_env() and get_langinfo() should go away. If we just > > > > > > need a codepage for get_ttyp ()->term_code_page, we should really > > > > > > find a > > > > > > way to do this from within internal_setlocale(). > > > > > > > > > > I looked into internal_setlocale() code, but I could not found > > > > > the code which handles thecode page. I found the code handling > > > > > the code page in __set_charset_from_locale() function in nlsfuncs.cc, > > > > > but it does not return code page itself. Could you please explain > > > > > more detail of your idea? > > > > > > > > I had none yet :) I was just musing, without actually thinking about a > > > > solution. But I think this isn't very complicated. Given this is > > > > inside Cygwin, nothing keeps the function to have a well-defined > > > > side-effect, as in setting a (not yet existing) member "term_code_page" > > > > of cygheap->locale. > > > > > > > > Kind of like this: > > > > [...] > > > I have tried your code, however, it does not work as expected. > > > It seems that __set_charset_from_locale() is not called. > > > cygheap->locale.term_code_page is always 0. > > > > Ah, right! Take a look into newlib/libc/locale/locale.c, function > > __loadlocale(). This function is called from _setlocale_r(). However, > > it calls __set_charset_from_locale() *only* if the charset isn't already > > given explicitely in the LC_* or LANG string, because otherwise we > > already know the charset, after all. > > > > Darn! That foils my plans for world domination... > > > > > Let me consider a while. > > > > Thanks, I'll do the same. I'd really like to simplify this stuff > > and doing the locale shuffle in two entirely different locations > > at different times is prone to getting out of sync. > > The only idea I had so far was, changing the way __set_charset_from_locale > works from within _setlocale_r: > > We could add a Cygwin-specific function only fetching the codepage and > call it unconditionally from _setlocale_r. __set_charset_from_locale is > called with a new parameter "codepage", so it doesn't have to fetch the > CP by itself, but it's still only called from _setlocale_r if necessary. > > Would that be sufficient? The CP conversion from 20127/ASCII to 65001/UTF8 > could be done at the point the codepage is actually required. I think I have found the answer to your request. Patch attached. What do you think of this patch? Calling initial_setlocale() is necessary because nl_langinfo() always returns "ANSI_X3.4-1968" regardless locale setting if this is not called. -- Takashi Yano 0001-Cygwin-pty-Replace-pty-specific-locale-functions-wit.patch Description: Binary data
Re: [PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"
Hi Johannes and Corinna, On Tue, 1 Sep 2020 18:19:16 +0200 (CEST) Johannes Schindelin wrote: > When `LANG=en_US.UTF-8`, the detected `LCID` is 0x0409, which is > correct, but after that (at least if Pseudo Console support is enabled), > we try to find the default code page for that `LCID`, which is ASCII > (437). Subsequently, we set the Console output code page to that value, > completely ignoring that we wanted to use UTF-8. > > Let's not ignore the specifically asked-for UTF-8 character set. > > While at it, let's also set the Console output code page even if Pseudo > Console support is disabled; contrary to the behavior of v3.0.7, the > Console output code page is not ignored in that case. > > The most common symptom would be that console applications which do not > specifically call `SetConsoleOutputCP()` but output UTF-8-encoded text > seem to be broken with v3.1.x when they worked plenty fine with v3.0.x. > > This fixes https://github.com/msys2/MSYS2-packages/issues/1974, > https://github.com/msys2/MSYS2-packages/issues/2012, > https://github.com/rust-lang/cargo/issues/8369, > https://github.com/git-for-windows/git/issues/2734, > https://github.com/git-for-windows/git/issues/2793, > https://github.com/git-for-windows/git/issues/2792, and possibly quite a > few others. > > Signed-off-by: Johannes Schindelin > --- > winsup/cygwin/fhandler_tty.cc | 9 + > 1 file changed, 9 insertions(+) > > diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc > index 06789a500..414c26992 100644 > --- a/winsup/cygwin/fhandler_tty.cc > +++ b/winsup/cygwin/fhandler_tty.cc > @@ -2859,6 +2859,15 @@ fhandler_pty_slave::setup_locale (void) >char charset[ENCODING_LEN + 1] = "ASCII"; >LCID lcid = get_langinfo (locale, charset); > > + /* Special-case the UTF-8 character set */ > + if (strcasecmp (charset, "UTF-8") == 0) > +{ > + get_ttyp ()->term_code_page = CP_UTF8; > + SetConsoleCP (CP_UTF8); > + SetConsoleOutputCP (CP_UTF8); > + return; > +} > + >/* Set console code page from locale */ >if (get_pseudo_console ()) > { > -- > 2.27.0 I would like to propose a counter patch attached. What do you think of this patch? This patch does not treat UTF-8 as special. -- Takashi Yano 0001-Cygwin-pty-Prevent-garbled-output-when-pseudo-consol.patch Description: Binary data
[PATCH] Cygwin: create install dir for libs
This fixes a race in parallel installs. --- winsup/cygwin/Makefile.in | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/winsup/cygwin/Makefile.in b/winsup/cygwin/Makefile.in index fac81759e..ea0243033 100644 --- a/winsup/cygwin/Makefile.in +++ b/winsup/cygwin/Makefile.in @@ -600,7 +600,7 @@ install: install-libs install-headers install-man install-ldif install_target \ uninstall: uninstall-libs uninstall-headers uninstall-man install-libs: $(TARGET_LIBS) - @$(MKDIRP) $(DESTDIR)$(bindir) + @$(MKDIRP) $(DESTDIR)$(bindir) $(DESTDIR)$(tooldir)/lib $(INSTALL_PROGRAM) $(TEST_DLL_NAME) $(DESTDIR)$(bindir)/$(DLL_NAME); \ for i in $^; do \ $(INSTALL_DATA) $$i $(DESTDIR)$(tooldir)/lib/`basename $$i` ; \ -- 2.28.0
Re: [PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"
Hi Takashi, On Sep 4 18:21, Takashi Yano via Cygwin-patches wrote: > Hi Corinna, > > On Thu, 3 Sep 2020 19:59:12 +0200 > Corinna Vinschen wrote: > > The only idea I had so far was, changing the way __set_charset_from_locale > > works from within _setlocale_r: > > > > We could add a Cygwin-specific function only fetching the codepage and > > call it unconditionally from _setlocale_r. __set_charset_from_locale is > > called with a new parameter "codepage", so it doesn't have to fetch the > > CP by itself, but it's still only called from _setlocale_r if necessary. > > > > Would that be sufficient? The CP conversion from 20127/ASCII to 65001/UTF8 > > could be done at the point the codepage is actually required. > > I think I have found the answer to your request. > Patch attached. What do you think of this patch? > > Calling initial_setlocale() is necessary because > nl_langinfo() always returns "ANSI_X3.4-1968" > regardless locale setting if this is not called. Well, this is correct. Per POSIX, a standard-conformant application is running in the "C" locale unless it calls setlocale() explicitely. That's one reason Cygwin defaults to UTF-8 internally. Everything, including the terminal, is supposed to default to UTF-8. That's the most sane default, even if an application is not locale-aware. So, to follow POSIX, initial_setlocale() is used to set up the environment and command line stuff and then, before calling the application's main, Cygwin calls _setlocale_r (_REENT, LC_CTYPE, "C"); to reset the apps default locale to "C". That's why nl_langinfo() returns "ANSI_X3.4-1968". However, the initial_setlocale() call in dll_crt0_1 calls internal_setlocale(), and *that* function sets the conversion functions for the internal conversions. What it *doesn't* do yet at the moment is to store the charset name itself or, better, the equivalent codepage. If we change that, setup_locale can simply go away. Below is a patch doing just that. Can you please check if that works in your test scenarios? However, there's something which worries me. Why do we need or set the pseudo terminal codepage at all? I see that you convert from MB charset to MB charset and then use the result in WriteFile to the connecting pipes. Question is this: Why not just converting the strings via sys_mbstowcs to a UTF-16 string and then send that over the line, using WriteConsoleW for the final output to the console? That would simplify this stuff quite a bit, wouldn't it? After all, for writing UTF-16 to the console, we simply don't need to know or care for the console CP. Thanks, Corinna >From d5dff1690d1a228579eef472441d67fb6ef20b5e Mon Sep 17 00:00:00 2001 From: Corinna Vinschen Date: Fri, 4 Sep 2020 14:39:58 +0200 Subject: [PATCH] Cygwin: fetch codepage for pseudo console at initial_setlocale time drop locale checking in fhandler_tty code. Create function __eval_codepage_from_internal_charset called from internal_setlocale to store term_code_page in cygheap, rather than in tty. Drop now unneeded setup_locale calls during fork and execve. Signed-off-by: Corinna Vinschen --- winsup/cygwin/cygheap.h | 1 + winsup/cygwin/fhandler.h | 1 - winsup/cygwin/fhandler_tty.cc | 195 ++ winsup/cygwin/nlsfuncs.cc | 54 +- winsup/cygwin/spawn.cc| 12 --- 5 files changed, 60 insertions(+), 203 deletions(-) diff --git a/winsup/cygwin/cygheap.h b/winsup/cygwin/cygheap.h index 8877cc358c39..2b84f4252071 100644 --- a/winsup/cygwin/cygheap.h +++ b/winsup/cygwin/cygheap.h @@ -341,6 +341,7 @@ struct cygheap_debug struct cygheap_locale { mbtowc_p mbtowc; + UINT term_code_page; }; struct user_heap_info diff --git a/winsup/cygwin/fhandler.h b/winsup/cygwin/fhandler.h index b4ba9428aa90..af619df5f9b1 100644 --- a/winsup/cygwin/fhandler.h +++ b/winsup/cygwin/fhandler.h @@ -2336,7 +2336,6 @@ class fhandler_pty_slave: public fhandler_pty_common void set_switch_to_pcon (void); void reset_switch_to_pcon (void); void mask_switch_to_pcon_in (bool mask); - void setup_locale (void); }; #define __ptsname(buf, unit) __small_sprintf ((buf), "/dev/pty%d", (unit)) diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc index 8bf39c3e6cf2..f207a0b27892 100644 --- a/winsup/cygwin/fhandler_tty.cc +++ b/winsup/cygwin/fhandler_tty.cc @@ -1614,8 +1614,9 @@ fhandler_pty_master::write (const void *ptr, size_t len) if (to_be_read_from_pcon () && get_ttyp ()->h_pseudo_console) { size_t nlen; - char *buf = convert_mb_str - (CP_UTF8, &nlen, get_ttyp ()->term_code_page, (const char *) ptr, len); + char *buf = convert_mb_str (CP_UTF8, &nlen, + cygheap->locale.term_code_page, + (const char *) ptr, len); WaitForSingleObject (input_mutex, INFINITE); @@ -1782,185 +1783,6 @@ fhandler_pty_common::set_close_on_exec (bool val) close_on_exec (val); } -/
Re: [PATCH] Cygwin: create install dir for libs
On Aug 27 09:02, David McFarland via Cygwin-patches wrote: > This fixes a race in parallel installs. > --- > winsup/cygwin/Makefile.in | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/winsup/cygwin/Makefile.in b/winsup/cygwin/Makefile.in > index fac81759e..ea0243033 100644 > --- a/winsup/cygwin/Makefile.in > +++ b/winsup/cygwin/Makefile.in > @@ -600,7 +600,7 @@ install: install-libs install-headers install-man > install-ldif install_target \ > uninstall: uninstall-libs uninstall-headers uninstall-man > > install-libs: $(TARGET_LIBS) > - @$(MKDIRP) $(DESTDIR)$(bindir) > + @$(MKDIRP) $(DESTDIR)$(bindir) $(DESTDIR)$(tooldir)/lib > $(INSTALL_PROGRAM) $(TEST_DLL_NAME) $(DESTDIR)$(bindir)/$(DLL_NAME); \ > for i in $^; do \ > $(INSTALL_DATA) $$i $(DESTDIR)$(tooldir)/lib/`basename $$i` ; \ > -- > 2.28.0 Pushed. Thanks, Corinna
Re: [PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"
On 2020-09-04 06:44, Corinna Vinschen wrote: > Hi Takashi, > > On Sep 4 18:21, Takashi Yano via Cygwin-patches wrote: >> Hi Corinna, >> >> On Thu, 3 Sep 2020 19:59:12 +0200 >> Corinna Vinschen wrote: >>> The only idea I had so far was, changing the way __set_charset_from_locale >>> works from within _setlocale_r: >>> >>> We could add a Cygwin-specific function only fetching the codepage and >>> call it unconditionally from _setlocale_r. __set_charset_from_locale is >>> called with a new parameter "codepage", so it doesn't have to fetch the >>> CP by itself, but it's still only called from _setlocale_r if necessary. >>> >>> Would that be sufficient? The CP conversion from 20127/ASCII to 65001/UTF8 >>> could be done at the point the codepage is actually required. >> >> I think I have found the answer to your request. >> Patch attached. What do you think of this patch? >> >> Calling initial_setlocale() is necessary because >> nl_langinfo() always returns "ANSI_X3.4-1968" >> regardless locale setting if this is not called. > > Well, this is correct. Per POSIX, a standard-conformant application is > running in the "C" locale unless it calls setlocale() explicitely. > That's one reason Cygwin defaults to UTF-8 internally. Everything, > including the terminal, is supposed to default to UTF-8. That's the > most sane default, even if an application is not locale-aware. > > So, to follow POSIX, initial_setlocale() is used to set up the > environment and command line stuff and then, before calling the > application's main, Cygwin calls _setlocale_r (_REENT, LC_CTYPE, "C"); > to reset the apps default locale to "C". That's why nl_langinfo() > returns "ANSI_X3.4-1968". > > However, the initial_setlocale() call in dll_crt0_1 calls > internal_setlocale(), and *that* function sets the conversion functions > for the internal conversions. What it *doesn't* do yet at the moment is > to store the charset name itself or, better, the equivalent codepage. > > If we change that, setup_locale can simply go away. Below is a patch > doing just that. Can you please check if that works in your test > scenarios? > > However, there's something which worries me. Why do we need or set the > pseudo terminal codepage at all? I see that you convert from MB charset > to MB charset and then use the result in WriteFile to the connecting > pipes. Question is this: Why not just converting the strings via > sys_mbstowcs to a UTF-16 string and then send that over the line, using > WriteConsoleW for the final output to the console? That would simplify > this stuff quite a bit, wouldn't it? After all, for writing UTF-16 to > the console, we simply don't need to know or care for the console CP. IIRC his locale was ja_JP.UTF-8 but he got English messages on CP 932! -- Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada This email may be disturbing to some readers as it contains too much technical detail. Reader discretion is advised. [Data in IEC units and prefixes, physical quantities in SI.]
Re: [PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"
Hi Takashi, On Fri, 4 Sep 2020, Takashi Yano via Cygwin-patches wrote: > Hi Johannes and Corinna, > > On Tue, 1 Sep 2020 18:19:16 +0200 (CEST) > Johannes Schindelin wrote: > > > When `LANG=en_US.UTF-8`, the detected `LCID` is 0x0409, which is > > correct, but after that (at least if Pseudo Console support is enabled), > > we try to find the default code page for that `LCID`, which is ASCII > > (437). Subsequently, we set the Console output code page to that value, > > completely ignoring that we wanted to use UTF-8. > > > > Let's not ignore the specifically asked-for UTF-8 character set. > > > > While at it, let's also set the Console output code page even if Pseudo > > Console support is disabled; contrary to the behavior of v3.0.7, the > > Console output code page is not ignored in that case. > > > > The most common symptom would be that console applications which do not > > specifically call `SetConsoleOutputCP()` but output UTF-8-encoded text > > seem to be broken with v3.1.x when they worked plenty fine with v3.0.x. > > > > This fixes https://github.com/msys2/MSYS2-packages/issues/1974, > > https://github.com/msys2/MSYS2-packages/issues/2012, > > https://github.com/rust-lang/cargo/issues/8369, > > https://github.com/git-for-windows/git/issues/2734, > > https://github.com/git-for-windows/git/issues/2793, > > https://github.com/git-for-windows/git/issues/2792, and possibly quite a > > few others. > > > > Signed-off-by: Johannes Schindelin > > --- > > winsup/cygwin/fhandler_tty.cc | 9 + > > 1 file changed, 9 insertions(+) > > > > diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc > > index 06789a500..414c26992 100644 > > --- a/winsup/cygwin/fhandler_tty.cc > > +++ b/winsup/cygwin/fhandler_tty.cc > > @@ -2859,6 +2859,15 @@ fhandler_pty_slave::setup_locale (void) > >char charset[ENCODING_LEN + 1] = "ASCII"; > >LCID lcid = get_langinfo (locale, charset); > > > > + /* Special-case the UTF-8 character set */ > > + if (strcasecmp (charset, "UTF-8") == 0) > > +{ > > + get_ttyp ()->term_code_page = CP_UTF8; > > + SetConsoleCP (CP_UTF8); > > + SetConsoleOutputCP (CP_UTF8); > > + return; > > +} > > + > >/* Set console code page from locale */ > >if (get_pseudo_console ()) > > { > > -- > > 2.27.0 > > I would like to propose a counter patch attached. > What do you think of this patch? > > This patch does not treat UTF-8 as special. Sure, but it only fixes the issue in `disable_pcon` mode in the current tip commit. That's because a new Pseudo Console is created for every spawned non-Cygwin console application, and that new Pseudo Console does _not_ have the code page set by your patch. I verified that this patch works around the issue in `disable_pcon` mode, which is good. Ciao, Dscho
Re: [PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"
Hi Corinna, On Fri, 4 Sep 2020 14:44:00 +0200 Corinna Vinschen wrote: > Hi Takashi, > > On Sep 4 18:21, Takashi Yano via Cygwin-patches wrote: > > Hi Corinna, > > > > On Thu, 3 Sep 2020 19:59:12 +0200 > > Corinna Vinschen wrote: > > > The only idea I had so far was, changing the way __set_charset_from_locale > > > works from within _setlocale_r: > > > > > > We could add a Cygwin-specific function only fetching the codepage and > > > call it unconditionally from _setlocale_r. __set_charset_from_locale is > > > called with a new parameter "codepage", so it doesn't have to fetch the > > > CP by itself, but it's still only called from _setlocale_r if necessary. > > > > > > Would that be sufficient? The CP conversion from 20127/ASCII to > > > 65001/UTF8 > > > could be done at the point the codepage is actually required. > > > > I think I have found the answer to your request. > > Patch attached. What do you think of this patch? > > > > Calling initial_setlocale() is necessary because > > nl_langinfo() always returns "ANSI_X3.4-1968" > > regardless locale setting if this is not called. > > Well, this is correct. Per POSIX, a standard-conformant application is > running in the "C" locale unless it calls setlocale() explicitely. > That's one reason Cygwin defaults to UTF-8 internally. Everything, > including the terminal, is supposed to default to UTF-8. That's the > most sane default, even if an application is not locale-aware. > > So, to follow POSIX, initial_setlocale() is used to set up the > environment and command line stuff and then, before calling the > application's main, Cygwin calls _setlocale_r (_REENT, LC_CTYPE, "C"); > to reset the apps default locale to "C". That's why nl_langinfo() > returns "ANSI_X3.4-1968". > > However, the initial_setlocale() call in dll_crt0_1 calls > internal_setlocale(), and *that* function sets the conversion functions > for the internal conversions. What it *doesn't* do yet at the moment is > to store the charset name itself or, better, the equivalent codepage. > > If we change that, setup_locale can simply go away. Below is a patch > doing just that. Can you please check if that works in your test > scenarios? I tried your patch, but unfortunately it does not work. cygheap->locale.term_code_page is 0 in pty master. If the following lines are moved in internal_setlocale(), const char *charset = __locale_charset (__get_global_locale ()); debug_printf ("Global charset set to %s", charset); /* Store codepage to be utilized by pseudo console code. */ cygheap->locale.term_code_page = __eval_codepage_from_internal_charset (charset); in internal_setlocale() before /* Don't do anything if the charset hasn't actually changed. */ if (cygheap->locale.mbtowc == __get_global_locale ()->mbtowc) return; cygheap->locale.term_code_page is always 65001 even if mintty is startted by mintty -o locale=ja_JP -o charset=CP932 or mintty -o locale=ja_JP -o charset=EUCJP Perhaps, this is because LANG is not set properly yet when mintty is started. > However, there's something which worries me. Why do we need or set the > pseudo terminal codepage at all? I see that you convert from MB charset > to MB charset and then use the result in WriteFile to the connecting > pipes. Question is this: Why not just converting the strings via > sys_mbstowcs to a UTF-16 string and then send that over the line, using > WriteConsoleW for the final output to the console? That would simplify > this stuff quite a bit, wouldn't it? After all, for writing UTF-16 to > the console, we simply don't need to know or care for the console CP. WriteConsoleW() cannot be used because the handle to_master_cyg is not a console handle. -- Takashi Yano
Re: [PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"
Hi Johannes, On Fri, 4 Sep 2020 08:23:42 +0200 (CEST) Johannes Schindelin wrote: > Hi Takashi, > > On Fri, 4 Sep 2020, Takashi Yano via Cygwin-patches wrote: > > > Hi Johannes and Corinna, > > > > On Tue, 1 Sep 2020 18:19:16 +0200 (CEST) > > Johannes Schindelin wrote: > > > > > When `LANG=en_US.UTF-8`, the detected `LCID` is 0x0409, which is > > > correct, but after that (at least if Pseudo Console support is enabled), > > > we try to find the default code page for that `LCID`, which is ASCII > > > (437). Subsequently, we set the Console output code page to that value, > > > completely ignoring that we wanted to use UTF-8. > > > > > > Let's not ignore the specifically asked-for UTF-8 character set. > > > > > > While at it, let's also set the Console output code page even if Pseudo > > > Console support is disabled; contrary to the behavior of v3.0.7, the > > > Console output code page is not ignored in that case. > > > > > > The most common symptom would be that console applications which do not > > > specifically call `SetConsoleOutputCP()` but output UTF-8-encoded text > > > seem to be broken with v3.1.x when they worked plenty fine with v3.0.x. > > > > > > This fixes https://github.com/msys2/MSYS2-packages/issues/1974, > > > https://github.com/msys2/MSYS2-packages/issues/2012, > > > https://github.com/rust-lang/cargo/issues/8369, > > > https://github.com/git-for-windows/git/issues/2734, > > > https://github.com/git-for-windows/git/issues/2793, > > > https://github.com/git-for-windows/git/issues/2792, and possibly quite a > > > few others. > > > > > > Signed-off-by: Johannes Schindelin > > > --- > > > winsup/cygwin/fhandler_tty.cc | 9 + > > > 1 file changed, 9 insertions(+) > > > > > > diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc > > > index 06789a500..414c26992 100644 > > > --- a/winsup/cygwin/fhandler_tty.cc > > > +++ b/winsup/cygwin/fhandler_tty.cc > > > @@ -2859,6 +2859,15 @@ fhandler_pty_slave::setup_locale (void) > > >char charset[ENCODING_LEN + 1] = "ASCII"; > > >LCID lcid = get_langinfo (locale, charset); > > > > > > + /* Special-case the UTF-8 character set */ > > > + if (strcasecmp (charset, "UTF-8") == 0) > > > +{ > > > + get_ttyp ()->term_code_page = CP_UTF8; > > > + SetConsoleCP (CP_UTF8); > > > + SetConsoleOutputCP (CP_UTF8); > > > + return; > > > +} > > > + > > >/* Set console code page from locale */ > > >if (get_pseudo_console ()) > > > { > > > -- > > > 2.27.0 > > > > I would like to propose a counter patch attached. > > What do you think of this patch? > > > > This patch does not treat UTF-8 as special. > > Sure, but it only fixes the issue in `disable_pcon` mode in the current > tip commit. That's because a new Pseudo Console is created for every > spawned non-Cygwin console application, and that new Pseudo Console does > _not_ have the code page set by your patch. You are right. However, if pseudo console is enabled, the app which works correclty in command prompt should work as well in pseudo console. Therefore, there is nothing to be fixed. > I verified that this patch works around the issue in `disable_pcon` mode, > which is good. Thanks for testing. -- Takashi Yano
Re: [PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"
Hi Takashi, On Sep 4 23:50, Takashi Yano via Cygwin-patches wrote: > Hi Corinna, > > On Fri, 4 Sep 2020 14:44:00 +0200 > Corinna Vinschen wrote: > > On Sep 4 18:21, Takashi Yano via Cygwin-patches wrote: > > > I think I have found the answer to your request. > > > Patch attached. What do you think of this patch? > > > > > > Calling initial_setlocale() is necessary because > > > nl_langinfo() always returns "ANSI_X3.4-1968" > > > regardless locale setting if this is not called. > > [...] > > However, the initial_setlocale() call in dll_crt0_1 calls > > internal_setlocale(), and *that* function sets the conversion functions > > for the internal conversions. What it *doesn't* do yet at the moment is > > to store the charset name itself or, better, the equivalent codepage. > > > > If we change that, setup_locale can simply go away. Below is a patch > > doing just that. Can you please check if that works in your test > > scenarios? > > I tried your patch, but unfortunately it does not work. > cygheap->locale.term_code_page is 0 in pty master. > > If the following lines are moved in internal_setlocale(), > > const char *charset = __locale_charset (__get_global_locale ()); > debug_printf ("Global charset set to %s", charset); > /* Store codepage to be utilized by pseudo console code. */ > cygheap->locale.term_code_page = > __eval_codepage_from_internal_charset (charset); > > in internal_setlocale() before > > /* Don't do anything if the charset hasn't actually changed. */ > if (cygheap->locale.mbtowc == __get_global_locale ()->mbtowc) > return; Uh, that makes sense. > cygheap->locale.term_code_page is always 65001 even if mintty is > startted by > mintty -o locale=ja_JP -o charset=CP932 > or > mintty -o locale=ja_JP -o charset=EUCJP > > Perhaps, this is because LANG is not set properly yet when mintty > is started. Yeah, that's the reason. The above settings of locale and charset on the CLI should only take over when mintty calls setlocale() with a matching string. The fact that it sets the matching value in the environment, too, should only affect child processes, not mintty itself. But it's incorrect to call initial_setlocale() from setup_locale() without resetting it to its original value. Unfortunately that doesn't solve any problem with the pseudo console codepage. Drat. It sounds like you need the terminal's charset, rather than the one set in the environment. So this boils down to the fact that term_code_page must be set after the application is already running and as soo as it creates the pty, me thinks. What if __eval_codepage_from_internal_charset() is called at pty creation? Or even on reading from /writing to the pty the first time? That should always be late enough to fetch the correct codepage. Patch attached. Does that work as expected? > > However, there's something which worries me. Why do we need or set the > > pseudo terminal codepage at all? I see that you convert from MB charset > > to MB charset and then use the result in WriteFile to the connecting > > pipes. Question is this: Why not just converting the strings via > > sys_mbstowcs to a UTF-16 string and then send that over the line, using > > WriteConsoleW for the final output to the console? That would simplify > > this stuff quite a bit, wouldn't it? After all, for writing UTF-16 to > > the console, we simply don't need to know or care for the console CP. > > WriteConsoleW() cannot be used because the handle to_master_cyg is > not a console handle. Sigh, of course. It's all just pipes. I forgot that, sorry. Btw., the main loop in fhandler_pty_master::pty_master_fwd_thread() calls char *buf = convert_mb_str (cygheap->locale.term_code_page, &nlen, CP_UTF8, ptr, wlen); ^^^ [...] WriteFile (to_master_cyg, ... But then, after the code breaks from that loop, it calls char *buf = convert_mb_str (cygheap->locale.term_code_page, &nlen, GetConsoleOutputCP (), ptr, wlen); ^ [...] process_opost_output (to_master_cyg, ... process_opost_output then calls WriteFile on that to_master_cyg handle, just like the WriteFile call above. Is that really correct? Shouldn't the second invocation use CP_UTF8 as well? Corinna >From 89acd62e88871a89b9bcb2964924f8960c7673ec Mon Sep 17 00:00:00 2001 From: Corinna Vinschen Date: Fri, 4 Sep 2020 21:20:35 +0200 Subject: [PATCH] Cygwin: try calling __eval_codepage_from_internal_charset in pty code the idea being, that we need the tty's locale and charset, not the environment locale settings when starting the tty. --- winsup/cygwin/fhandler_tty.cc | 17 + winsup/cygwin/nlsfuncs.cc | 12 +--- 2 files changed, 22 insertions(+), 7 deletions(-) diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc index f207a0b27892..feb014175123 100644
[PATCH v4 2/3] regparm: make code highlight happy
--- winsup/cygwin/regparm.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/winsup/cygwin/regparm.h b/winsup/cygwin/regparm.h index cce1bab4a..a14c501db 100644 --- a/winsup/cygwin/regparm.h +++ b/winsup/cygwin/regparm.h @@ -8,7 +8,7 @@ details. */ #pragma once -#if defined (__x86_64__) || defined (__CYGMAGIC__) +#if defined (__x86_64__) || defined (__CYGMAGIC__) || !defined (__GNUC__) # define __reg1 # define __reg2 # define __reg3 -- 2.20.1.windows.1
[PATCH v4 1/3] Cygwin: rewrite and make public cmdline parser
This commit rewrites the cmdline parser to achieve the following: * MSVCRT compatibility. Except for the single-quote handling (an extension for compatibility with old Cygwin), the parser now interprets option boundaries exactly like MSVCR since 2008. This fixes the issue where our escaping does not work with our own parsing. * Clarity. Since globify() is no longer responsible for handling the opening and closing of quotes, the code is much simpler. * Sanity. The GLOB_NOCHECK flag is removed, so a failed glob correctly returns the literal value. Without the change, anything path-like would be garbled by globify's escaping. * A memory leak in the @file expansion is removed by rewriting it to use a stack of buffers. This also simplifies the code since we no longer have to move stuff. The "downside" is that tokens can no longer cross file boundaries. Some clarifications are made in the documentation for when globs are not expanded. The functions are made public for testing, but my tcl setup is currently too messed up for running them! I did test them as an isolated program on WSL-Debian. The change fixes two complaints of mine: * That cygwin is incompatible with its own escape.[1] * That there is no way to echo `C:\"` from win32.[2] [1]: https://cygwin.com/pipermail/cygwin/2020-June/245162.html [2]: https://cygwin.com/pipermail/cygwin/2019-October/242790.html (It's never the point to spawn cygwin32 from cygwin64. Consistency matters: with yourself always, and with the outside world when you are supposed to.) This is the fourth version of the patch. I provide all my patches to Cygwin, including this one and all future ones, under the 2-clause BSD license. --- winsup/cygwin/common.din | 2 + winsup/cygwin/dcrt0.cc| 297 +--- winsup/cygwin/include/sys/cygwin.h| 2 + winsup/cygwin/winf.cc | 378 +- winsup/cygwin/winf.h | 8 +- winsup/cygwin/winsup.h| 7 +- winsup/doc/cygwinenv.xml | 8 +- winsup/doc/faq-api.xml| 2 +- winsup/doc/misc-funcs.xml | 71 + winsup/testsuite/Makefile.in | 6 +- winsup/testsuite/winsup.api/cmdline.c | 102 +++ 11 files changed, 577 insertions(+), 306 deletions(-) create mode 100644 winsup/testsuite/winsup.api/cmdline.c diff --git a/winsup/cygwin/common.din b/winsup/cygwin/common.din index a7b4aa2b0..f15dc6a9e 100644 --- a/winsup/cygwin/common.din +++ b/winsup/cygwin/common.din @@ -393,6 +393,8 @@ cygwin_split_path NOSIGFE cygwin_stackdump SIGFE cygwin_umount SIGFE cygwin_winpid_to_pid SIGFE +cygwin_cmdline_parse SIGFE +cygwin_cmdline_build SIGFE daemon SIGFE dbm_clearerr SIGFE dbm_close SIGFE diff --git a/winsup/cygwin/dcrt0.cc b/winsup/cygwin/dcrt0.cc index 5d8b4b74e..6f906102d 100644 --- a/winsup/cygwin/dcrt0.cc +++ b/winsup/cygwin/dcrt0.cc @@ -10,7 +10,6 @@ details. */ #include "miscfuncs.h" #include #include -#include "glob.h" #include #include #include @@ -35,6 +34,7 @@ details. */ #include "cygxdr.h" #include "fenv.h" #include "ntdll.h" +#include "winf.h" #define MAX_AT_FILE_LEVEL 10 @@ -77,292 +77,6 @@ do_global_ctors (void (**in_pfunc)(), int force) (*pfunc) (); } -/* - * Replaces @file in the command line with the contents of the file. - * There may be multiple @file's in a single command line - * A \@file is replaced with @file so that echo \@foo would print - * @foo and not the contents of foo. - */ -static bool __stdcall -insert_file (char *name, char *&cmd) -{ - HANDLE f; - DWORD size; - tmp_pathbuf tp; - - PWCHAR wname = tp.w_get (); - sys_mbstowcs (wname, NT_MAX_PATH, name + 1); - f = CreateFileW (wname, - GENERIC_READ,/* open for reading */ - FILE_SHARE_VALID_FLAGS, /* share for reading*/ - &sec_none_nih, /* default security */ - OPEN_EXISTING, /* existing file only */ - FILE_ATTRIBUTE_NORMAL, /* normal file */ - NULL); /* no attr. template*/ - - if (f == INVALID_HANDLE_VALUE) -{ - debug_printf ("couldn't open file '%s', %E", name); - return false; -} - - /* This only supports files up to about 4 billion bytes in - size. I am making the bold assumption that this is big - enough for this feature */ - size = GetFileSize (f, NULL); - if (size == 0x) -{ - CloseHandle (f); - debug_printf ("couldn't get file size for '%s', %E", name); - return false; -} - - int new_size = strlen (cmd) + size + 2; - char *tmp = (char *) malloc (new_size); - if (!tmp) -{ - CloseHandle (f); - debug_printf ("malloc failed, %E"); - return false; -} - - /* realloc passed as it should */ - DWORD rf_read; - BOOL rf_result; - rf_result = ReadFile (f,
[PATCH v4 3/3] testsuite: don't strip dir from obj files
Make has no idea how to build an o file when the correspoinding c file is not there. That happens when we strip the dir. --- winsup/testsuite/Makefile.in | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/winsup/testsuite/Makefile.in b/winsup/testsuite/Makefile.in index bdc116d12..ad3656be8 100644 --- a/winsup/testsuite/Makefile.in +++ b/winsup/testsuite/Makefile.in @@ -75,7 +75,7 @@ endif RUNTIME=$(cygwin_build)/cygwin0.dll $(cygwin_build)/libcygwin0.a TESTSUP_LIB_NAME:=libltp.a -TESTSUP_OFILES:=${sort ${addsuffix .o,${basename ${notdir ${wildcard $(libltp_srcdir)/lib/*.c} +TESTSUP_OFILES:=${sort ${addsuffix .o,${basename ${wildcard $(libltp_srcdir)/lib/*.c override ALL_CFLAGS:=${filter-out -O%,$(ALL_CFLAGS)} override COMPILE_CC:=${filter-out -O%,$(COMPILE_CC)} -- 2.20.1.windows.1