Re: [PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"
On Sep 1 18:19, Johannes Schindelin wrote: > When `LANG=en_US.UTF-8`, the detected `LCID` is 0x0409, which is > correct, but after that (at least if Pseudo Console support is enabled), > we try to find the default code page for that `LCID`, which is ASCII > (437). Subsequently, we set the Console output code page to that value, > completely ignoring that we wanted to use UTF-8. > > Let's not ignore the specifically asked-for UTF-8 character set. > > While at it, let's also set the Console output code page even if Pseudo > Console support is disabled; contrary to the behavior of v3.0.7, the > Console output code page is not ignored in that case. > > The most common symptom would be that console applications which do not > specifically call `SetConsoleOutputCP()` but output UTF-8-encoded text > seem to be broken with v3.1.x when they worked plenty fine with v3.0.x. > > This fixes https://github.com/msys2/MSYS2-packages/issues/1974, > https://github.com/msys2/MSYS2-packages/issues/2012, > https://github.com/rust-lang/cargo/issues/8369, > https://github.com/git-for-windows/git/issues/2734, > https://github.com/git-for-windows/git/issues/2793, > https://github.com/git-for-windows/git/issues/2792, and possibly quite a > few others. > > Signed-off-by: Johannes Schindelin > --- > winsup/cygwin/fhandler_tty.cc | 9 + > 1 file changed, 9 insertions(+) Ok guys, I'm not opposed to this change in terms of its result, but I'm starting to wonder why all this locale code in fhandler_tty is necessary at all. I see that get_langinfo() calls __loadlocale and performs a lot of stuff on the charsets which looks like duplicates of the initial_setlocale() call performed at DLL startup. If there's anything missing in the initial_setlocale() call which would be required by the pseudo tty code? What exactly is it? The codepage? And why can't we just add the info to cygheap->locale at initial_setlocale() time so it's available at exec time without going through all this hassle every time? Apart from that, all this locale/charset/lcid stuff should be concentrated in nlsfunc.cc ideally. Thanks, Corinna
Re: [PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"
On Sep 2 10:30, Corinna Vinschen wrote: > On Sep 1 18:19, Johannes Schindelin wrote: > > When `LANG=en_US.UTF-8`, the detected `LCID` is 0x0409, which is > > correct, but after that (at least if Pseudo Console support is enabled), > > we try to find the default code page for that `LCID`, which is ASCII > > (437). Subsequently, we set the Console output code page to that value, > > completely ignoring that we wanted to use UTF-8. > > > > Let's not ignore the specifically asked-for UTF-8 character set. > > > > While at it, let's also set the Console output code page even if Pseudo > > Console support is disabled; contrary to the behavior of v3.0.7, the > > Console output code page is not ignored in that case. > > > > The most common symptom would be that console applications which do not > > specifically call `SetConsoleOutputCP()` but output UTF-8-encoded text > > seem to be broken with v3.1.x when they worked plenty fine with v3.0.x. > > > > This fixes https://github.com/msys2/MSYS2-packages/issues/1974, > > https://github.com/msys2/MSYS2-packages/issues/2012, > > https://github.com/rust-lang/cargo/issues/8369, > > https://github.com/git-for-windows/git/issues/2734, > > https://github.com/git-for-windows/git/issues/2793, > > https://github.com/git-for-windows/git/issues/2792, and possibly quite a > > few others. > > > > Signed-off-by: Johannes Schindelin > > --- > > winsup/cygwin/fhandler_tty.cc | 9 + > > 1 file changed, 9 insertions(+) > > Ok guys, I'm not opposed to this change in terms of its result, > but I'm starting to wonder why all this locale code in fhandler_tty > is necessary at all. > > I see that get_langinfo() calls __loadlocale and performs a lot of stuff > on the charsets which looks like duplicates of the initial_setlocale() > call performed at DLL startup. > > If there's anything missing in the initial_setlocale() call which would > be required by the pseudo tty code? What exactly is it? The codepage? > And why can't we just add the info to cygheap->locale at initial_setlocale() > time so it's available at exec time without going through all this hassle > every time? > > Apart from that, all this locale/charset/lcid stuff should be concentrated > in nlsfunc.cc ideally. get_locale_from_env() and get_langinfo() should go away. If we just need a codepage for get_ttyp ()->term_code_page, we should really find a way to do this from within internal_setlocale(). Corinna
Re: [PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"
On Wed, 2 Sep 2020 10:30:14 +0200 Corinna Vinschen wrote: > On Sep 1 18:19, Johannes Schindelin wrote: > > When `LANG=en_US.UTF-8`, the detected `LCID` is 0x0409, which is > > correct, but after that (at least if Pseudo Console support is enabled), > > we try to find the default code page for that `LCID`, which is ASCII > > (437). Subsequently, we set the Console output code page to that value, > > completely ignoring that we wanted to use UTF-8. > > > > Let's not ignore the specifically asked-for UTF-8 character set. > > > > While at it, let's also set the Console output code page even if Pseudo > > Console support is disabled; contrary to the behavior of v3.0.7, the > > Console output code page is not ignored in that case. > > > > The most common symptom would be that console applications which do not > > specifically call `SetConsoleOutputCP()` but output UTF-8-encoded text > > seem to be broken with v3.1.x when they worked plenty fine with v3.0.x. > > > > This fixes https://github.com/msys2/MSYS2-packages/issues/1974, > > https://github.com/msys2/MSYS2-packages/issues/2012, > > https://github.com/rust-lang/cargo/issues/8369, > > https://github.com/git-for-windows/git/issues/2734, > > https://github.com/git-for-windows/git/issues/2793, > > https://github.com/git-for-windows/git/issues/2792, and possibly quite a > > few others. > > > > Signed-off-by: Johannes Schindelin > > --- > > winsup/cygwin/fhandler_tty.cc | 9 + > > 1 file changed, 9 insertions(+) > > Ok guys, I'm not opposed to this change in terms of its result, I am sorry, but I cannot agree with Johannes's patch. For example, code page in Japan is CP932 by default. In this case, cmd.exe, netsh.exe and so on are generate messages in Japanese. If the code page is set to CP_UTF8, messages from such commands changes to english. I guess similar things happen for other locales. I do not prefer this result. Furthermore, I looked into the issue: https://github.com/git-for-windows/git/issues/2734 and I found that git-for-windows always use utf-8 encoding even if the locale is ja_JP.CP932. It does not change coding based on locale or code page. Even with Johannes's patch, if mintty is started with locale ja_JP.CP932, the file name will be garbled bacause SetConsoleOutputCP(CP_UTF8) will not be called. IMHO, it is the problem of git-for-windows rather than cygwin and msys2. To make current version of git-for-windows work, it is necessary to set code page to CP_UTF8 regardless locale. This does not make sense at all. -- Takashi Yano
Re: [PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"
Hi Corinna, On Wed, 2 Sep 2020 10:38:18 +0200 Corinna Vinschen wrote: > On Sep 2 10:30, Corinna Vinschen wrote: > > Ok guys, I'm not opposed to this change in terms of its result, > > but I'm starting to wonder why all this locale code in fhandler_tty > > is necessary at all. > > > > I see that get_langinfo() calls __loadlocale and performs a lot of stuff > > on the charsets which looks like duplicates of the initial_setlocale() > > call performed at DLL startup. > > > > If there's anything missing in the initial_setlocale() call which would > > be required by the pseudo tty code? What exactly is it? The codepage? > > And why can't we just add the info to cygheap->locale at initial_setlocale() > > time so it's available at exec time without going through all this hassle > > every time? > > > > Apart from that, all this locale/charset/lcid stuff should be concentrated > > in nlsfunc.cc ideally. > > get_locale_from_env() and get_langinfo() should go away. If we just > need a codepage for get_ttyp ()->term_code_page, we should really find a > way to do this from within internal_setlocale(). I looked into internal_setlocale() code, but I could not found the code which handles thecode page. I found the code handling the code page in __set_charset_from_locale() function in nlsfuncs.cc, but it does not return code page itself. Could you please explain more detail of your idea? -- Takashi Yano
Re: [PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"
Hi, On Tue, 1 Sep 2020, Johannes Schindelin wrote: > When `LANG=en_US.UTF-8`, the detected `LCID` is 0x0409, which is > correct, but after that (at least if Pseudo Console support is enabled), > we try to find the default code page for that `LCID`, which is ASCII > (437). Subsequently, we set the Console output code page to that value, > completely ignoring that we wanted to use UTF-8. > > Let's not ignore the specifically asked-for UTF-8 character set. > > While at it, let's also set the Console output code page even if Pseudo > Console support is disabled; contrary to the behavior of v3.0.7, the > Console output code page is not ignored in that case. > > The most common symptom would be that console applications which do not > specifically call `SetConsoleOutputCP()` but output UTF-8-encoded text > seem to be broken with v3.1.x when they worked plenty fine with v3.0.x. > > This fixes https://github.com/msys2/MSYS2-packages/issues/1974, > https://github.com/msys2/MSYS2-packages/issues/2012, > https://github.com/rust-lang/cargo/issues/8369, > https://github.com/git-for-windows/git/issues/2734, > https://github.com/git-for-windows/git/issues/2793, > https://github.com/git-for-windows/git/issues/2792, and possibly quite a > few others. > > Signed-off-by: Johannes Schindelin > --- > winsup/cygwin/fhandler_tty.cc | 9 + > 1 file changed, 9 insertions(+) > > diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc > index 06789a500..414c26992 100644 > --- a/winsup/cygwin/fhandler_tty.cc > +++ b/winsup/cygwin/fhandler_tty.cc > @@ -2859,6 +2859,15 @@ fhandler_pty_slave::setup_locale (void) >char charset[ENCODING_LEN + 1] = "ASCII"; >LCID lcid = get_langinfo (locale, charset); > > + /* Special-case the UTF-8 character set */ > + if (strcasecmp (charset, "UTF-8") == 0) > +{ > + get_ttyp ()->term_code_page = CP_UTF8; > + SetConsoleCP (CP_UTF8); > + SetConsoleOutputCP (CP_UTF8); > + return; > +} > + Just a word of warning: while this patch can be ported to a634adda5 (libm/machine/arm: Rename s*_fma.c -> s*_fma_arm.c, 2020-09-01) relatively easily (and the first two patches of this patch series cannot, as they are no longer applicable after the complete redesign of the Pseudo Console support), it only works as intended in the `disable_pcon` mode. The new design calls for Pseudo Consoles to be created per spawned console application. And I have not found any way to convince my local version of the runtime to change the code page of these Pseudo Consoles away from the rather unfortunate default 437. This is a problem. Take for example https://github.com/git-for-windows/git/issues/2793. Telling the users that they should patch node.js and recompile is probably not going to fly. Hopefully there is a way to fix this, otherwise Pseudo Console support will continue to be quite the support burden. Ciao, Johannes >/* Set console code page from locale */ >if (get_pseudo_console ()) > { > -- > 2.27.0 > >
Re: [PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"
Hi Takashi, On Wed, 2 Sep 2020, Takashi Yano via Cygwin-patches wrote: > On Wed, 2 Sep 2020 10:30:14 +0200 > Corinna Vinschen wrote: > > On Sep 1 18:19, Johannes Schindelin wrote: > > > When `LANG=en_US.UTF-8`, the detected `LCID` is 0x0409, which is > > > correct, but after that (at least if Pseudo Console support is enabled), > > > we try to find the default code page for that `LCID`, which is ASCII > > > (437). Subsequently, we set the Console output code page to that value, > > > completely ignoring that we wanted to use UTF-8. > > > > > > Let's not ignore the specifically asked-for UTF-8 character set. > > > > > > While at it, let's also set the Console output code page even if Pseudo > > > Console support is disabled; contrary to the behavior of v3.0.7, the > > > Console output code page is not ignored in that case. > > > > > > The most common symptom would be that console applications which do not > > > specifically call `SetConsoleOutputCP()` but output UTF-8-encoded text > > > seem to be broken with v3.1.x when they worked plenty fine with v3.0.x. > > > > > > This fixes https://github.com/msys2/MSYS2-packages/issues/1974, > > > https://github.com/msys2/MSYS2-packages/issues/2012, > > > https://github.com/rust-lang/cargo/issues/8369, > > > https://github.com/git-for-windows/git/issues/2734, > > > https://github.com/git-for-windows/git/issues/2793, > > > https://github.com/git-for-windows/git/issues/2792, and possibly quite a > > > few others. > > > > > > Signed-off-by: Johannes Schindelin > > > --- > > > winsup/cygwin/fhandler_tty.cc | 9 + > > > 1 file changed, 9 insertions(+) > > > > Ok guys, I'm not opposed to this change in terms of its result, > > I am sorry, but I cannot agree with Johannes's patch. > > For example, code page in Japan is CP932 by default. > In this case, cmd.exe, netsh.exe and so on are generate > messages in Japanese. > > If the code page is set to CP_UTF8, messages from such > commands changes to english. I guess similar things happen > for other locales. > > I do not prefer this result. > > Furthermore, I looked into the issue: > https://github.com/git-for-windows/git/issues/2734 > and I found that git-for-windows always use utf-8 > encoding even if the locale is ja_JP.CP932. > It does not change coding based on locale or code > page. > > Even with Johannes's patch, if mintty is started with > locale ja_JP.CP932, the file name will be garbled > bacause SetConsoleOutputCP(CP_UTF8) will not be called. > > IMHO, it is the problem of git-for-windows rather > than cygwin and msys2. > > To make current version of git-for-windows work, it is > necessary to set code page to CP_UTF8 regardless locale. > This does not make sense at all. You are misrepresenting the problem. It has nothing to do with Git for Windows. For example, if you run tests in an Angular project inside Cygwin's MinTTY, the output will be garbled because node.js (or the Angular libraries, I don't know) will interpret `LANG` or something. This is a much bigger problem than you make it sound. The console applications that do _not_ call `SetConsoleOutputCP()` are so ubiquituous. I think you are underestimating that problem rather dramatically. Ciao, Johannes
Re: [PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"
On Wed, 2 Sep 2020 08:26:04 +0200 (CEST) Johannes Schindelin wrote: > Hi Takashi, > > On Wed, 2 Sep 2020, Takashi Yano via Cygwin-patches wrote: > > > On Wed, 2 Sep 2020 10:30:14 +0200 > > Corinna Vinschen wrote: > > > On Sep 1 18:19, Johannes Schindelin wrote: > > > > When `LANG=en_US.UTF-8`, the detected `LCID` is 0x0409, which is > > > > correct, but after that (at least if Pseudo Console support is enabled), > > > > we try to find the default code page for that `LCID`, which is ASCII > > > > (437). Subsequently, we set the Console output code page to that value, > > > > completely ignoring that we wanted to use UTF-8. > > > > > > > > Let's not ignore the specifically asked-for UTF-8 character set. > > > > > > > > While at it, let's also set the Console output code page even if Pseudo > > > > Console support is disabled; contrary to the behavior of v3.0.7, the > > > > Console output code page is not ignored in that case. > > > > > > > > The most common symptom would be that console applications which do not > > > > specifically call `SetConsoleOutputCP()` but output UTF-8-encoded text > > > > seem to be broken with v3.1.x when they worked plenty fine with v3.0.x. > > > > > > > > This fixes https://github.com/msys2/MSYS2-packages/issues/1974, > > > > https://github.com/msys2/MSYS2-packages/issues/2012, > > > > https://github.com/rust-lang/cargo/issues/8369, > > > > https://github.com/git-for-windows/git/issues/2734, > > > > https://github.com/git-for-windows/git/issues/2793, > > > > https://github.com/git-for-windows/git/issues/2792, and possibly quite a > > > > few others. > > > > > > > > Signed-off-by: Johannes Schindelin > > > > --- > > > > winsup/cygwin/fhandler_tty.cc | 9 + > > > > 1 file changed, 9 insertions(+) > > > > > > Ok guys, I'm not opposed to this change in terms of its result, > > > > I am sorry, but I cannot agree with Johannes's patch. > > > > For example, code page in Japan is CP932 by default. > > In this case, cmd.exe, netsh.exe and so on are generate > > messages in Japanese. > > > > If the code page is set to CP_UTF8, messages from such > > commands changes to english. I guess similar things happen > > for other locales. > > > > I do not prefer this result. > > > > Furthermore, I looked into the issue: > > https://github.com/git-for-windows/git/issues/2734 > > and I found that git-for-windows always use utf-8 > > encoding even if the locale is ja_JP.CP932. > > It does not change coding based on locale or code > > page. > > > > Even with Johannes's patch, if mintty is started with > > locale ja_JP.CP932, the file name will be garbled > > bacause SetConsoleOutputCP(CP_UTF8) will not be called. > > > > IMHO, it is the problem of git-for-windows rather > > than cygwin and msys2. > > > > To make current version of git-for-windows work, it is > > necessary to set code page to CP_UTF8 regardless locale. > > This does not make sense at all. > > You are misrepresenting the problem. It has nothing to do with Git for > Windows. For example, if you run tests in an Angular project inside > Cygwin's MinTTY, the output will be garbled because node.js (or the > Angular libraries, I don't know) will interpret `LANG` or something. You listed these issues in git-for-windows: > > > > https://github.com/git-for-windows/git/issues/2734, > > > > https://github.com/git-for-windows/git/issues/2792, didn't you? Therefore, I looked into them. OK, I will check Angular/CLI next. But I am not familier with Agnular/CLI. Could you please provide simple steps to reproduce the problem? > This is a much bigger problem than you make it sound. The console > applications that do _not_ call `SetConsoleOutputCP()` are so > ubiquituous. I think you are underestimating that problem rather > dramatically. > > Ciao, > Johannes -- Takashi Yano
Re: [PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"
Hi Takashi, On Wed, 2 Sep 2020, Takashi Yano wrote: > On Wed, 2 Sep 2020 08:26:04 +0200 (CEST) > Johannes Schindelin wrote: > > Hi Takashi, > > > > On Wed, 2 Sep 2020, Takashi Yano via Cygwin-patches wrote: > > > > > On Wed, 2 Sep 2020 10:30:14 +0200 > > > Corinna Vinschen wrote: > > > > On Sep 1 18:19, Johannes Schindelin wrote: > > > > > When `LANG=en_US.UTF-8`, the detected `LCID` is 0x0409, which is > > > > > correct, but after that (at least if Pseudo Console support is > > > > > enabled), > > > > > we try to find the default code page for that `LCID`, which is ASCII > > > > > (437). Subsequently, we set the Console output code page to that > > > > > value, > > > > > completely ignoring that we wanted to use UTF-8. > > > > > > > > > > Let's not ignore the specifically asked-for UTF-8 character set. > > > > > > > > > > While at it, let's also set the Console output code page even if > > > > > Pseudo > > > > > Console support is disabled; contrary to the behavior of v3.0.7, the > > > > > Console output code page is not ignored in that case. > > > > > > > > > > The most common symptom would be that console applications which do > > > > > not > > > > > specifically call `SetConsoleOutputCP()` but output UTF-8-encoded text > > > > > seem to be broken with v3.1.x when they worked plenty fine with > > > > > v3.0.x. > > > > > > > > > > This fixes https://github.com/msys2/MSYS2-packages/issues/1974, > > > > > https://github.com/msys2/MSYS2-packages/issues/2012, > > > > > https://github.com/rust-lang/cargo/issues/8369, > > > > > https://github.com/git-for-windows/git/issues/2734, > > > > > https://github.com/git-for-windows/git/issues/2793, > > > > > https://github.com/git-for-windows/git/issues/2792, and possibly > > > > > quite a > > > > > few others. > > > > > > > > > > Signed-off-by: Johannes Schindelin > > > > > --- > > > > > winsup/cygwin/fhandler_tty.cc | 9 + > > > > > 1 file changed, 9 insertions(+) > > > > > > > > Ok guys, I'm not opposed to this change in terms of its result, > > > > > > I am sorry, but I cannot agree with Johannes's patch. > > > > > > For example, code page in Japan is CP932 by default. > > > In this case, cmd.exe, netsh.exe and so on are generate > > > messages in Japanese. > > > > > > If the code page is set to CP_UTF8, messages from such > > > commands changes to english. I guess similar things happen > > > for other locales. > > > > > > I do not prefer this result. > > > > > > Furthermore, I looked into the issue: > > > https://github.com/git-for-windows/git/issues/2734 > > > and I found that git-for-windows always use utf-8 > > > encoding even if the locale is ja_JP.CP932. > > > It does not change coding based on locale or code > > > page. > > > > > > Even with Johannes's patch, if mintty is started with > > > locale ja_JP.CP932, the file name will be garbled > > > bacause SetConsoleOutputCP(CP_UTF8) will not be called. > > > > > > IMHO, it is the problem of git-for-windows rather > > > than cygwin and msys2. > > > > > > To make current version of git-for-windows work, it is > > > necessary to set code page to CP_UTF8 regardless locale. > > > This does not make sense at all. > > > > You are misrepresenting the problem. It has nothing to do with Git for > > Windows. For example, if you run tests in an Angular project inside > > Cygwin's MinTTY, the output will be garbled because node.js (or the > > Angular libraries, I don't know) will interpret `LANG` or something. > > You listed these issues in git-for-windows: > > > > > https://github.com/git-for-windows/git/issues/2734, > > > > > https://github.com/git-for-windows/git/issues/2792, > didn't you? Therefore, I looked into them. > > OK, I will check Angular/CLI next. But I am not familier with > Agnular/CLI. Could you please provide simple steps to reproduce > the problem? Here is a report: https://github.com/git-for-windows/git/issues/2793 But why do you want to look into Angular/CLI? Do you really think that it makes sense to try to fix every console app out there? Really? Wouldn't it make more sense to just accept that there are many console applications that are broken by the recent change, and accommodate them in the Cygwin runtime? That would take substantially less time. Ciao, Johannes > > > This is a much bigger problem than you make it sound. The console > > applications that do _not_ call `SetConsoleOutputCP()` are so > > ubiquituous. I think you are underestimating that problem rather > > dramatically. > > > > Ciao, > > Johannes > > > -- > Takashi Yano >
Re: [PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"
On Wed, 2 Sep 2020 11:12:53 +0200 (CEST) Johannes Schindelin wrote: > On Wed, 2 Sep 2020, Takashi Yano wrote: > > OK, I will check Angular/CLI next. But I am not familier with > > Agnular/CLI. Could you please provide simple steps to reproduce > > the problem? > > Here is a report: https://github.com/git-for-windows/git/issues/2793 I already read that thread, and I have try following step. 1) Install node.js 2) npm install --global @angular/cli 3) ng new test-app 4) cd test-app 5) ng test --code-coverage However, the output is very differnt from the bug report, and there seems to be no garbled output. The output is something like: Compiling @angular/core : es2015 as esm2015 Compiling @angular/animations : es2015 as esm2015 Compiling @angular/compiler/testing : es2015 as esm2015 Compiling @angular/common : es2015 as esm2015 Compiling @angular/animations/browser : es2015 as esm2015 Compiling @angular/core/testing : es2015 as esm2015 Compiling @angular/animations/browser/testing : es2015 as esm2015 Compiling @angular/platform-browser : es2015 as esm2015 Compiling @angular/common/http : es2015 as esm2015 Compiling @angular/common/testing : es2015 as esm2015 Compiling @angular/router : es2015 as esm2015 Compiling @angular/forms : es2015 as esm2015 Compiling @angular/platform-browser-dynamic : es2015 as esm2015 Compiling @angular/platform-browser/testing : es2015 as esm2015 Compiling @angular/platform-browser/animations : es2015 as esm2015 Compiling @angular/common/http/testing : es2015 as esm2015 Compiling @angular/platform-browser-dynamic/testing : es2015 as esm2015 Compiling @angular/router/testing : es2015 as esm2015 10% building 2/2 modules 0 active02 09 2020 23:49:25.110:WARN [karma]: No captur ed browser, open http://localhost:9876/ 02 09 2020 23:49:25.118:INFO [karma-server]: Karma v5.0.9 server started at http ://0.0.0.0:9876/ 02 09 2020 23:49:25.119:INFO [launcher]: Launching browsers Chrome with concurre ncy unlimited 02 09 2020 23:49:25.125:INFO [launcher]: Starting browser Chrome 92% additional asset processing copy-webpack-plugin02 09 2020 23:49:34.003:WARN [karma]: No captured browser, open http://localhost:9876/ 02 09 2020 23:49:34.907:INFO [Chrome 85.0.4183.83 (Windows 10)]: Connected on so cket z7khH23JfW4v-_Tt with id 59608191 Chrome 85.0.4183.83 (Windows 10): Executed 3 of 3 SUCCESS (0.62 secs / 0.273 sec s) TOTAL: 3 SUCCESS TOTAL: 3 SUCCESS TOTAL: 3 SUCCESS === Coverage summary === Statements : 100% ( 6/6 ) Branches : 100% ( 0/0 ) Functions: 100% ( 0/0 ) Lines: 100% ( 5/5 ) What's wrong? > But why do you want to look into Angular/CLI? Do you really think that it > makes sense to try to fix every console app out there? Really? Wouldn't it > make more sense to just accept that there are many console applications > that are broken by the recent change, and accommodate them in the Cygwin > runtime? That would take substantially less time. It is important to know what's happning actually to fix the issue. Superficial patch makes the problem more complicated. -- Takashi Yano
Re: [PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"
On Sep 2 19:54, Takashi Yano via Cygwin-patches wrote: > Hi Corinna, > > On Wed, 2 Sep 2020 10:38:18 +0200 > Corinna Vinschen wrote: > > On Sep 2 10:30, Corinna Vinschen wrote: > > > Ok guys, I'm not opposed to this change in terms of its result, > > > but I'm starting to wonder why all this locale code in fhandler_tty > > > is necessary at all. > > > > > > I see that get_langinfo() calls __loadlocale and performs a lot of stuff > > > on the charsets which looks like duplicates of the initial_setlocale() > > > call performed at DLL startup. > > > > > > If there's anything missing in the initial_setlocale() call which would > > > be required by the pseudo tty code? What exactly is it? The codepage? > > > And why can't we just add the info to cygheap->locale at > > > initial_setlocale() > > > time so it's available at exec time without going through all this hassle > > > every time? > > > > > > Apart from that, all this locale/charset/lcid stuff should be concentrated > > > in nlsfunc.cc ideally. > > > > get_locale_from_env() and get_langinfo() should go away. If we just > > need a codepage for get_ttyp ()->term_code_page, we should really find a > > way to do this from within internal_setlocale(). > > I looked into internal_setlocale() code, but I could not found > the code which handles thecode page. I found the code handling > the code page in __set_charset_from_locale() function in nlsfuncs.cc, > but it does not return code page itself. Could you please explain > more detail of your idea? I had none yet :) I was just musing, without actually thinking about a solution. But I think this isn't very complicated. Given this is inside Cygwin, nothing keeps the function to have a well-defined side-effect, as in setting a (not yet existing) member "term_code_page" of cygheap->locale. Kind of like this: diff --git a/winsup/cygwin/cygheap.h b/winsup/cygwin/cygheap.h index 8877cc358c39..2b84f4252071 100644 --- a/winsup/cygwin/cygheap.h +++ b/winsup/cygwin/cygheap.h @@ -341,6 +341,7 @@ struct cygheap_debug struct cygheap_locale { mbtowc_p mbtowc; + UINT term_code_page; }; struct user_heap_info diff --git a/winsup/cygwin/nlsfuncs.cc b/winsup/cygwin/nlsfuncs.cc index 668d7eb9e778..752f4239d911 100644 --- a/winsup/cygwin/nlsfuncs.cc +++ b/winsup/cygwin/nlsfuncs.cc @@ -1298,6 +1298,9 @@ __set_charset_from_locale (const char *locale, char *charset) LOCALE_IDEFAULTANSICODEPAGE | LOCALE_RETURN_NUMBER, (PWCHAR) &cp, sizeof cp)) cp = 0; + /* Store codepage in cygheap->locale so fhandler_tty can switch the + pseudo console to the correct codepage. */ + cygheap->locale.term_code_page = cp ?: CP_UTF8; /* Translate codepage and lcid to a charset closely aligned with the default charsets defined in Glibc. */ const char *cs; Make sense? Thanks, Corinna
Re: [PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"
On Sep 2 17:24, Corinna Vinschen wrote: > On Sep 2 19:54, Takashi Yano via Cygwin-patches wrote: > > Hi Corinna, > > > > On Wed, 2 Sep 2020 10:38:18 +0200 > > Corinna Vinschen wrote: > > > On Sep 2 10:30, Corinna Vinschen wrote: > > > > Ok guys, I'm not opposed to this change in terms of its result, > > > > but I'm starting to wonder why all this locale code in fhandler_tty > > > > is necessary at all. > > > > > > > > I see that get_langinfo() calls __loadlocale and performs a lot of stuff > > > > on the charsets which looks like duplicates of the initial_setlocale() > > > > call performed at DLL startup. > > > > > > > > If there's anything missing in the initial_setlocale() call which would > > > > be required by the pseudo tty code? What exactly is it? The codepage? > > > > And why can't we just add the info to cygheap->locale at > > > > initial_setlocale() > > > > time so it's available at exec time without going through all this > > > > hassle > > > > every time? > > > > > > > > Apart from that, all this locale/charset/lcid stuff should be > > > > concentrated > > > > in nlsfunc.cc ideally. > > > > > > get_locale_from_env() and get_langinfo() should go away. If we just > > > need a codepage for get_ttyp ()->term_code_page, we should really find a > > > way to do this from within internal_setlocale(). > > > > I looked into internal_setlocale() code, but I could not found > > the code which handles thecode page. I found the code handling > > the code page in __set_charset_from_locale() function in nlsfuncs.cc, > > but it does not return code page itself. Could you please explain > > more detail of your idea? > > I had none yet :) I was just musing, without actually thinking about a > solution. But I think this isn't very complicated. Given this is > inside Cygwin, nothing keeps the function to have a well-defined > side-effect, as in setting a (not yet existing) member "term_code_page" > of cygheap->locale. > > Kind of like this: Actually, this is a bit too simple, but you get the idea. We need to align the terminal codepage with the actual Cygwin charset, along the lines of what your setup_locale is doing standalone yet. Except in case of ASCII, where we default to UTF-8 internally. The important part here is that we do this once, and that we don't have unnecessary code duplication. Corinna
Re: [PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"
Hi Corinna, On Wed, 2 Sep 2020 17:24:50 +0200 Corinna Vinschen wrote: > On Sep 2 19:54, Takashi Yano via Cygwin-patches wrote: > > Hi Corinna, > > > > On Wed, 2 Sep 2020 10:38:18 +0200 > > Corinna Vinschen wrote: > > > On Sep 2 10:30, Corinna Vinschen wrote: > > > > Ok guys, I'm not opposed to this change in terms of its result, > > > > but I'm starting to wonder why all this locale code in fhandler_tty > > > > is necessary at all. > > > > > > > > I see that get_langinfo() calls __loadlocale and performs a lot of stuff > > > > on the charsets which looks like duplicates of the initial_setlocale() > > > > call performed at DLL startup. > > > > > > > > If there's anything missing in the initial_setlocale() call which would > > > > be required by the pseudo tty code? What exactly is it? The codepage? > > > > And why can't we just add the info to cygheap->locale at > > > > initial_setlocale() > > > > time so it's available at exec time without going through all this > > > > hassle > > > > every time? > > > > > > > > Apart from that, all this locale/charset/lcid stuff should be > > > > concentrated > > > > in nlsfunc.cc ideally. > > > > > > get_locale_from_env() and get_langinfo() should go away. If we just > > > need a codepage for get_ttyp ()->term_code_page, we should really find a > > > way to do this from within internal_setlocale(). > > > > I looked into internal_setlocale() code, but I could not found > > the code which handles thecode page. I found the code handling > > the code page in __set_charset_from_locale() function in nlsfuncs.cc, > > but it does not return code page itself. Could you please explain > > more detail of your idea? > > I had none yet :) I was just musing, without actually thinking about a > solution. But I think this isn't very complicated. Given this is > inside Cygwin, nothing keeps the function to have a well-defined > side-effect, as in setting a (not yet existing) member "term_code_page" > of cygheap->locale. > > Kind of like this: > > diff --git a/winsup/cygwin/cygheap.h b/winsup/cygwin/cygheap.h > index 8877cc358c39..2b84f4252071 100644 > --- a/winsup/cygwin/cygheap.h > +++ b/winsup/cygwin/cygheap.h > @@ -341,6 +341,7 @@ struct cygheap_debug > struct cygheap_locale > { >mbtowc_p mbtowc; > + UINT term_code_page; > }; > > struct user_heap_info > diff --git a/winsup/cygwin/nlsfuncs.cc b/winsup/cygwin/nlsfuncs.cc > index 668d7eb9e778..752f4239d911 100644 > --- a/winsup/cygwin/nlsfuncs.cc > +++ b/winsup/cygwin/nlsfuncs.cc > @@ -1298,6 +1298,9 @@ __set_charset_from_locale (const char *locale, char > *charset) > LOCALE_IDEFAULTANSICODEPAGE | LOCALE_RETURN_NUMBER, > (PWCHAR) &cp, sizeof cp)) > cp = 0; > + /* Store codepage in cygheap->locale so fhandler_tty can switch the > + pseudo console to the correct codepage. */ > + cygheap->locale.term_code_page = cp ?: CP_UTF8; >/* Translate codepage and lcid to a charset closely aligned with the > default > charsets defined in Glibc. */ >const char *cs; > > Make sense? I have tried your code, however, it does not work as expected. It seems that __set_charset_from_locale() is not called. cygheap->locale.term_code_page is always 0. I have added following lines into setup_locale() to make sure to call __set_charset_from_locale() for a test, setlocale (LC_ALL, ""); __set_charset_from_locale (__get_global_locale()->categories[LC_CTYPE], charset); get_ttyp ()->term_code_page = cygheap->locale.term_code_page; however, term_code_page is set to 932 if locale is ja_JP.UTF-8. In this case term_code_page should be CP_UTF8 (65001). The code page retrieved in __set_charset_from_locale() is not based on "UTF-8" but "ja_JP". Let me consider a while. -- Takashi Yano
Re: [PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"
Hi Takashi, On Sep 3 01:25, Takashi Yano via Cygwin-patches wrote: > Hi Corinna, > > On Wed, 2 Sep 2020 17:24:50 +0200 > Corinna Vinschen wrote: > > > > get_locale_from_env() and get_langinfo() should go away. If we just > > > > need a codepage for get_ttyp ()->term_code_page, we should really find a > > > > way to do this from within internal_setlocale(). > > > > > > I looked into internal_setlocale() code, but I could not found > > > the code which handles thecode page. I found the code handling > > > the code page in __set_charset_from_locale() function in nlsfuncs.cc, > > > but it does not return code page itself. Could you please explain > > > more detail of your idea? > > > > I had none yet :) I was just musing, without actually thinking about a > > solution. But I think this isn't very complicated. Given this is > > inside Cygwin, nothing keeps the function to have a well-defined > > side-effect, as in setting a (not yet existing) member "term_code_page" > > of cygheap->locale. > > > > Kind of like this: > > [...] > I have tried your code, however, it does not work as expected. > It seems that __set_charset_from_locale() is not called. > cygheap->locale.term_code_page is always 0. Ah, right! Take a look into newlib/libc/locale/locale.c, function __loadlocale(). This function is called from _setlocale_r(). However, it calls __set_charset_from_locale() *only* if the charset isn't already given explicitely in the LC_* or LANG string, because otherwise we already know the charset, after all. Darn! That foils my plans for world domination... > Let me consider a while. Thanks, I'll do the same. I'd really like to simplify this stuff and doing the locale shuffle in two entirely different locations at different times is prone to getting out of sync. Corinna