Re: [PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"
On Sep 8 17:16, Takashi Yano via Cygwin-patches wrote: > On Mon, 7 Sep 2020 23:17:36 +0200 (CEST) > Johannes Schindelin wrote: > > Hi Takashi, > > > > On Sat, 5 Sep 2020, Takashi Yano wrote: > > > > > On Fri, 4 Sep 2020 08:23:42 +0200 (CEST) > > > Johannes Schindelin wrote: > > > > > > > > On Fri, 4 Sep 2020, Takashi Yano via Cygwin-patches wrote: > > > > > > > > > On Tue, 1 Sep 2020 18:19:16 +0200 (CEST) > > > > > Johannes Schindelin wrote: > > > > > [...] > > `LANG=en_US.UTF-8` (meaning your patches force their console applications' > > output to be interpreted with code page 437) and therefore for those > > users, things looked fine before, and now they don't. > > > > Note that I am not talking about developers who develop said console > > applications. I am talking about users who use those console applications. > > In other words, I am talking about a vastly larger group of affected > > people. > > > > All of those people (or at least a substantial majority) will now have to > > be told to please disable Pseudo Console support in v3.2.0 because they > > would have to patch and rebuild those console applications that don't call > > `SetConsoleOutputCP()`, and that is certainly unreasonable to expect of > > the majority of users. Not even the `cmd /c chcp 65001` work-around (that > > helps with v3.1.7) will work with v3.2.0 when Pseudo Console support is > > enabled. > > In the case where pseudo console is disabled, the patch I proposed in > https://cygwin.com/pipermail/cygwin-patches/2020q3/010548.html > will solve the issue. I mean apps which work correctly in cygwin 3.0.7 > work in cygwin 3.2.0 as well with that patch. > > OTOH, in the case where pseudo console is enabled, non-cygwin apps which > work correctly in command prompt, work in cygwin 3.2.0 as well. > > It is enough for all, isn't it? > > You may think that all non-cygwin apps which work in cygwin 3.0.7 must > work in cygwin 3.2.0 even when pseudo console is enabled, but it is > against the purpose of the pseudo console support. The goal of pseudo > console support is to make non-cygwin apps work as if it is executed in > command prompt. > > By the way, you complained regarding garbled output of the program which > outputs UTF-8 string regardless of locale. > https://cygwin.com/pipermail/cygwin-developers/2020-August/011951.html > However, many Japanese programmers, for example, make programs which > output SJIS (CP932) string regardless of locale. > > Why do you think the former must work while the latter deos not have to? > Is there any reasonable reason other than backward compatibility? Is that still a concern with the latest from master? There's a snapshot for testing, Johannes. Takashi, does the patch from https://cygwin.com/pipermail/cygwin-developers/2020-August/011951.html still apply to the latest from master? Question is, shouldn't the Windows calls setting the codepage be only called if started from child_info_spawn::worker for non-Cygwin executables? Corinna
Re: [PATCH] Cygwin: pty: Fix input charset for non-cygwin apps with disable_pcon.
Hi Corinna, On Tue, 8 Sep 2020 20:42:47 +0200 Corinna Vinschen wrote: > Hi Takashi, > > On Sep 8 18:57, Takashi Yano via Cygwin-patches wrote: > > - If the non-cygwin apps is executed under pseudo console disabled, > > multibyte input for the apps are garbled. This patch fixes the > > issue. > > --- > > winsup/cygwin/fhandler_tty.cc | 13 - > > 1 file changed, 12 insertions(+), 1 deletion(-) > > > > diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc > > index 6de591d9b..afaa4546e 100644 > > --- a/winsup/cygwin/fhandler_tty.cc > > +++ b/winsup/cygwin/fhandler_tty.cc > > @@ -271,8 +271,17 @@ fhandler_pty_master::accept_input () > >bytes_left = eat_readahead (-1); > > > >HANDLE write_to = get_output_handle (); > > + char *buf = NULL; > >if (to_be_read_from_pcon ()) > > -write_to = to_slave; > > +{ > > + write_to = to_slave; > > + size_t nlen; > > + buf = convert_mb_str (GetConsoleCP (), &nlen, > > + get_ttyp ()->term_code_page, > > + (const char *) p, bytes_left); > > + p = buf; > > + bytes_left = nlen; > > +} > > How big are chances that the string in p is larger than 32767 chars? > > I'd like to see convert_mb_str use a tmp_pathbuf buffer instead of > calling HeapAlloc/HeapFree every time. That also drops the mb_str_free > entirely. > > Isn't there a problem anyway with calling convert_mb_str? Consider > a write call which stops in the middle of a multibyte char, the > second half only sent with the next write call. convert_mb_str > only allows to convert complete multibyte chars, and the caller does > not keep something like an mbstate_t around, which would allow > continuation of split multibyte chars. Thanks for the advice. I will submit a series of patches which reflect your advice. -- Takashi Yano
[PATCH 2/2] Cygwin: pty: Fix input charset for non-cygwin apps with disable_pcon.
- If the non-cygwin apps is executed under pseudo console disabled, multibyte input for the apps are garbled. This patch fixes the issue. --- winsup/cygwin/fhandler_tty.cc | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc index e7485af72..7fa8c99e8 100644 --- a/winsup/cygwin/fhandler_tty.cc +++ b/winsup/cygwin/fhandler_tty.cc @@ -309,8 +309,18 @@ fhandler_pty_master::accept_input () bytes_left = eat_readahead (-1); HANDLE write_to = get_output_handle (); + tmp_pathbuf tp; + char *mbbuf = tp.c_get (); if (to_be_read_from_pcon ()) -write_to = to_slave; +{ + write_to = to_slave; + static mbpend_t mbpend; + size_t nlen = NT_MAX_PATH; + convert_mb_str (GetConsoleCP (), mbbuf, &nlen, + get_ttyp ()->term_code_page, p, bytes_left, &mbpend); + p = mbbuf; + bytes_left = nlen; +} if (!bytes_left) { -- 2.28.0
[PATCH 0/2] Cygwin: pty: Changes regarding charset conversion.
Takashi Yano (2): Cygwin: pty: Revise convert_mb_str() function. Cygwin: pty: Fix input charset for non-cygwin apps with disable_pcon. winsup/cygwin/fhandler_tty.cc | 135 +++--- 1 file changed, 93 insertions(+), 42 deletions(-) -- 2.28.0
[PATCH 1/2] Cygwin: pty: Revise convert_mb_str() function.
- Use tmp_pathbuf instead of HeapAlloc()/HeapFree(). - Remove mb_str_free() function. - Consider the case where the multibyte string stops in the middle of a multibyte char. --- winsup/cygwin/fhandler_tty.cc | 123 ++ 1 file changed, 82 insertions(+), 41 deletions(-) diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc index 6de591d9b..e7485af72 100644 --- a/winsup/cygwin/fhandler_tty.cc +++ b/winsup/cygwin/fhandler_tty.cc @@ -26,6 +26,7 @@ details. */ #include #include "cygwait.h" #include "registry.h" +#include "tls_pbuf.h" #ifndef PROC_THREAD_ATTRIBUTE_PSEUDOCONSOLE #define PROC_THREAD_ATTRIBUTE_PSEUDOCONSOLE 0x00020016 @@ -116,40 +117,77 @@ CreateProcessW_Hooked return CreateProcessW_Orig (n, c, pa, ta, inh, f, e, d, si, pi); } -static char * -convert_mb_str (UINT cp_to, size_t *len_to, - UINT cp_from, const char *ptr_from, size_t len_from) +typedef struct { + int len; + char buf[5]; +} mbpend_t; + +static void +convert_mb_str (UINT cp_to, char *ptr_to, size_t *len_to, + UINT cp_from, const char *ptr_from, size_t len_from, + mbpend_t *pend) { - char *buf; size_t nlen; + tmp_pathbuf tp; if (cp_to != cp_from) { - size_t wlen = - MultiByteToWideChar (cp_from, 0, ptr_from, len_from, NULL, 0); - wchar_t *wbuf = (wchar_t *) - HeapAlloc (GetProcessHeap (), 0, wlen * sizeof (wchar_t)); - wlen = - MultiByteToWideChar (cp_from, 0, ptr_from, len_from, wbuf, wlen); - nlen = WideCharToMultiByte (cp_to, 0, wbuf, wlen, NULL, 0, NULL, NULL); - buf = (char *) HeapAlloc (GetProcessHeap (), 0, nlen); - nlen = WideCharToMultiByte (cp_to, 0, wbuf, wlen, buf, nlen, NULL, NULL); - HeapFree (GetProcessHeap (), 0, wbuf); + wchar_t *wbuf = tp.w_get (); + int wlen = 0; + if (cp_from == 65000) /* UTF-7 */ + /* MB_ERR_INVALID_CHARS does not work properly for UTF-7. + Therefore, just convert string without checking */ + wlen = MultiByteToWideChar (cp_from, 0, ptr_from, len_from, + wbuf, NT_MAX_PATH); + else + { + char *tmpbuf = tp.c_get (); + memcpy (tmpbuf, pend->buf, pend->len); + if (pend->len + len_from > NT_MAX_PATH) + len_from = NT_MAX_PATH - pend->len; + memcpy (tmpbuf + pend->len, ptr_from, len_from); + int total_len = pend->len + len_from; + pend->len = 0; + int mblen = 0; + for (const char *p = tmpbuf; p < tmpbuf + total_len; p += mblen) + /* Max bytes in multibyte char is 6. */ + for (mblen = 1; mblen <= 6; mblen ++) + { + /* Try conversion */ + int l = MultiByteToWideChar (cp_from, MB_ERR_INVALID_CHARS, +p, mblen, +wbuf + wlen, NT_MAX_PATH - wlen); + if (l) + { /* Conversion Success */ + wlen += l; + break; + } + else if (mblen == 6) + { /* Conversion Fail */ + l = MultiByteToWideChar (cp_from, 0, p, 1, +wbuf + wlen, NT_MAX_PATH - wlen); + wlen += l; + mblen = 1; + break; + } + else if (p + mblen == tmpbuf + total_len) + { /* Multibyte char incomplete */ + memcpy (pend->buf, p, mblen); + pend->len = mblen; + break; + } + /* Retry conversion with extended length */ + } + } + nlen = WideCharToMultiByte (cp_to, 0, wbuf, wlen, + ptr_to, *len_to, NULL, NULL); } else { /* Just copy */ - buf = (char *) HeapAlloc (GetProcessHeap (), 0, len_from); - memcpy (buf, ptr_from, len_from); - nlen = len_from; + nlen = min (*len_to, len_from); + memcpy (ptr_to, ptr_from, nlen); } *len_to = nlen; - return buf; -} - -static void -mb_str_free (char *ptr) -{ - HeapFree (GetProcessHeap (), 0, ptr); } static bool @@ -1613,9 +1651,13 @@ fhandler_pty_master::write (const void *ptr, size_t len) if current application is native console application. */ if (to_be_read_from_pcon () && get_ttyp ()->h_pseudo_console) { - size_t nlen; - char *buf = convert_mb_str (CP_UTF8, &nlen, get_ttyp ()->term_code_page, - (const char *) ptr, len); + tmp_pathbuf tp; + char *buf = tp.c_get (); + static mbpend_t mbpend; + size_t nlen = NT_MAX_PATH; + convert_mb_str (CP_UTF8, buf, &nlen, + get_ttyp ()->term_code_page, (const char *) ptr, len, + &mbpend); WaitForSingleObject (input_mutex, INFINITE); @
Re: [PATCH 1/2] Cygwin: pty: Revise convert_mb_str() function.
On Wed, 9 Sep 2020 17:07:20 +0900 Takashi Yano wrote: > + /* Max bytes in multibyte char is 6. */ > + for (mblen = 1; mblen <= 6; mblen ++) Sorry, I misunderstood that the max utf-8 char length is 6. Actually, it is 4, therefore mbstate_t can be used. I will submit v2 patch. -- Takashi Yano
[PATCH v2 1/2] Cygwin: pty: Revise convert_mb_str() function.
- Use tmp_pathbuf instead of HeapAlloc()/HeapFree(). - Remove mb_str_free() function. - Consider the case where the multibyte string stops in the middle of a multibyte char. --- winsup/cygwin/fhandler_tty.cc | 118 ++ 1 file changed, 77 insertions(+), 41 deletions(-) diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc index 6de591d9b..c4b7fc61d 100644 --- a/winsup/cygwin/fhandler_tty.cc +++ b/winsup/cygwin/fhandler_tty.cc @@ -26,6 +26,7 @@ details. */ #include #include "cygwait.h" #include "registry.h" +#include "tls_pbuf.h" #ifndef PROC_THREAD_ATTRIBUTE_PSEUDOCONSOLE #define PROC_THREAD_ATTRIBUTE_PSEUDOCONSOLE 0x00020016 @@ -116,40 +117,72 @@ CreateProcessW_Hooked return CreateProcessW_Orig (n, c, pa, ta, inh, f, e, d, si, pi); } -static char * -convert_mb_str (UINT cp_to, size_t *len_to, - UINT cp_from, const char *ptr_from, size_t len_from) +static void +convert_mb_str (UINT cp_to, char *ptr_to, size_t *len_to, + UINT cp_from, const char *ptr_from, size_t len_from, + mbstate_t *stat) { - char *buf; size_t nlen; + tmp_pathbuf tp; if (cp_to != cp_from) { - size_t wlen = - MultiByteToWideChar (cp_from, 0, ptr_from, len_from, NULL, 0); - wchar_t *wbuf = (wchar_t *) - HeapAlloc (GetProcessHeap (), 0, wlen * sizeof (wchar_t)); - wlen = - MultiByteToWideChar (cp_from, 0, ptr_from, len_from, wbuf, wlen); - nlen = WideCharToMultiByte (cp_to, 0, wbuf, wlen, NULL, 0, NULL, NULL); - buf = (char *) HeapAlloc (GetProcessHeap (), 0, nlen); - nlen = WideCharToMultiByte (cp_to, 0, wbuf, wlen, buf, nlen, NULL, NULL); - HeapFree (GetProcessHeap (), 0, wbuf); + wchar_t *wbuf = tp.w_get (); + int wlen = 0; + if (cp_from == 65000) /* UTF-7 */ + /* MB_ERR_INVALID_CHARS does not work properly for UTF-7. + Therefore, just convert string without checking */ + wlen = MultiByteToWideChar (cp_from, 0, ptr_from, len_from, + wbuf, NT_MAX_PATH); + else + { + char *tmpbuf = tp.c_get (); + memcpy (tmpbuf, stat->__value.__wchb, stat->__count); + if (stat->__count + len_from > NT_MAX_PATH) + len_from = NT_MAX_PATH - stat->__count; + memcpy (tmpbuf + stat->__count, ptr_from, len_from); + int total_len = stat->__count + len_from; + stat->__count = 0; + int mblen = 0; + for (const char *p = tmpbuf; p < tmpbuf + total_len; p += mblen) + /* Max bytes in multibyte char is 4. */ + for (mblen = 1; mblen <= 4; mblen ++) + { + /* Try conversion */ + int l = MultiByteToWideChar (cp_from, MB_ERR_INVALID_CHARS, +p, mblen, +wbuf + wlen, NT_MAX_PATH - wlen); + if (l) + { /* Conversion Success */ + wlen += l; + break; + } + else if (mblen == 4) + { /* Conversion Fail */ + l = MultiByteToWideChar (cp_from, 0, p, 1, +wbuf + wlen, NT_MAX_PATH - wlen); + wlen += l; + mblen = 1; + break; + } + else if (p + mblen == tmpbuf + total_len) + { /* Multibyte char incomplete */ + memcpy (stat->__value.__wchb, p, mblen); + stat->__count = mblen; + break; + } + /* Retry conversion with extended length */ + } + } + nlen = WideCharToMultiByte (cp_to, 0, wbuf, wlen, + ptr_to, *len_to, NULL, NULL); } else { /* Just copy */ - buf = (char *) HeapAlloc (GetProcessHeap (), 0, len_from); - memcpy (buf, ptr_from, len_from); - nlen = len_from; + nlen = min (*len_to, len_from); + memcpy (ptr_to, ptr_from, nlen); } *len_to = nlen; - return buf; -} - -static void -mb_str_free (char *ptr) -{ - HeapFree (GetProcessHeap (), 0, ptr); } static bool @@ -1613,9 +1646,13 @@ fhandler_pty_master::write (const void *ptr, size_t len) if current application is native console application. */ if (to_be_read_from_pcon () && get_ttyp ()->h_pseudo_console) { - size_t nlen; - char *buf = convert_mb_str (CP_UTF8, &nlen, get_ttyp ()->term_code_page, - (const char *) ptr, len); + tmp_pathbuf tp; + char *buf = tp.c_get (); + static mbstate_t mbstat; + size_t nlen = NT_MAX_PATH; + convert_mb_str (CP_UTF8, buf, &nlen, + get_ttyp ()->term_code_page, (const char *) ptr, len, + &mbstat); WaitForSingleObject (input_mutex, INFINITE); @@ -1650,7
[PATCH v2 2/2] Cygwin: pty: Fix input charset for non-cygwin apps with disable_pcon.
- If the non-cygwin apps is executed under pseudo console disabled, multibyte input for the apps are garbled. This patch fixes the issue. --- winsup/cygwin/fhandler_tty.cc | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc index c4b7fc61d..701381ea6 100644 --- a/winsup/cygwin/fhandler_tty.cc +++ b/winsup/cygwin/fhandler_tty.cc @@ -304,8 +304,18 @@ fhandler_pty_master::accept_input () bytes_left = eat_readahead (-1); HANDLE write_to = get_output_handle (); + tmp_pathbuf tp; + char *mbbuf = tp.c_get (); if (to_be_read_from_pcon ()) -write_to = to_slave; +{ + write_to = to_slave; + static mbstate_t mbstat; + size_t nlen = NT_MAX_PATH; + convert_mb_str (GetConsoleCP (), mbbuf, &nlen, + get_ttyp ()->term_code_page, p, bytes_left, &mbstat); + p = mbbuf; + bytes_left = nlen; +} if (!bytes_left) { -- 2.28.0
[PATCH v2 0/2] Cygwin: pty: Changes regarding charset conversion.
Takashi Yano (2): Cygwin: pty: Revise convert_mb_str() function. Cygwin: pty: Fix input charset for non-cygwin apps with disable_pcon. winsup/cygwin/fhandler_tty.cc | 130 +++--- 1 file changed, 88 insertions(+), 42 deletions(-) -- 2.28.0
Re: [PATCH v2 1/2] Cygwin: pty: Revise convert_mb_str() function.
On Sep 9 22:40, Takashi Yano via Cygwin-patches wrote: > - Use tmp_pathbuf instead of HeapAlloc()/HeapFree(). > - Remove mb_str_free() function. > - Consider the case where the multibyte string stops in the middle > of a multibyte char. > --- > winsup/cygwin/fhandler_tty.cc | 118 ++ > 1 file changed, 77 insertions(+), 41 deletions(-) > > diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc > index 6de591d9b..c4b7fc61d 100644 > --- a/winsup/cygwin/fhandler_tty.cc > +++ b/winsup/cygwin/fhandler_tty.cc > @@ -26,6 +26,7 @@ details. */ > #include > #include "cygwait.h" > #include "registry.h" > +#include "tls_pbuf.h" > > #ifndef PROC_THREAD_ATTRIBUTE_PSEUDOCONSOLE > #define PROC_THREAD_ATTRIBUTE_PSEUDOCONSOLE 0x00020016 > @@ -116,40 +117,72 @@ CreateProcessW_Hooked >return CreateProcessW_Orig (n, c, pa, ta, inh, f, e, d, si, pi); > } > > -static char * > -convert_mb_str (UINT cp_to, size_t *len_to, > - UINT cp_from, const char *ptr_from, size_t len_from) > +static void > +convert_mb_str (UINT cp_to, char *ptr_to, size_t *len_to, > + UINT cp_from, const char *ptr_from, size_t len_from, > + mbstate_t *stat) Heh, it's funny to use mbstate_t together with Windows functions :) However, the var name `stat' makes me itch. It looks too close to struct stat and function stat. What about "mbp" or something along these lines? > { > - char *buf; >size_t nlen; > + tmp_pathbuf tp; >if (cp_to != cp_from) > { > - size_t wlen = > - MultiByteToWideChar (cp_from, 0, ptr_from, len_from, NULL, 0); > - wchar_t *wbuf = (wchar_t *) > - HeapAlloc (GetProcessHeap (), 0, wlen * sizeof (wchar_t)); > - wlen = > - MultiByteToWideChar (cp_from, 0, ptr_from, len_from, wbuf, wlen); > - nlen = WideCharToMultiByte (cp_to, 0, wbuf, wlen, NULL, 0, NULL, NULL); > - buf = (char *) HeapAlloc (GetProcessHeap (), 0, nlen); > - nlen = WideCharToMultiByte (cp_to, 0, wbuf, wlen, buf, nlen, NULL, > NULL); > - HeapFree (GetProcessHeap (), 0, wbuf); > + wchar_t *wbuf = tp.w_get (); > + int wlen = 0; > + if (cp_from == 65000) /* UTF-7 */ CP_UTF7? > + /* MB_ERR_INVALID_CHARS does not work properly for UTF-7. > +Therefore, just convert string without checking */ > + wlen = MultiByteToWideChar (cp_from, 0, ptr_from, len_from, > + wbuf, NT_MAX_PATH); > + else > + { > + char *tmpbuf = tp.c_get (); > + memcpy (tmpbuf, stat->__value.__wchb, stat->__count); > + if (stat->__count + len_from > NT_MAX_PATH) > + len_from = NT_MAX_PATH - stat->__count; > + memcpy (tmpbuf + stat->__count, ptr_from, len_from); > + int total_len = stat->__count + len_from; > + stat->__count = 0; > + int mblen = 0; > + for (const char *p = tmpbuf; p < tmpbuf + total_len; p += mblen) > + /* Max bytes in multibyte char is 4. */ > + for (mblen = 1; mblen <= 4; mblen ++) > + { > + /* Try conversion */ > + int l = MultiByteToWideChar (cp_from, MB_ERR_INVALID_CHARS, > + p, mblen, > + wbuf + wlen, NT_MAX_PATH - wlen); > + if (l) > + { /* Conversion Success */ > + wlen += l; > + break; > + } > + else if (mblen == 4) > + { /* Conversion Fail */ > + l = MultiByteToWideChar (cp_from, 0, p, 1, > + wbuf + wlen, NT_MAX_PATH - wlen); > + wlen += l; > + mblen = 1; > + break; > + } > + else if (p + mblen == tmpbuf + total_len) > + { /* Multibyte char incomplete */ > + memcpy (stat->__value.__wchb, p, mblen); > + stat->__count = mblen; > + break; > + } > + /* Retry conversion with extended length */ > + } > + } > + nlen = WideCharToMultiByte (cp_to, 0, wbuf, wlen, > + ptr_to, *len_to, NULL, NULL); > } >else > { >/* Just copy */ > - buf = (char *) HeapAlloc (GetProcessHeap (), 0, len_from); > - memcpy (buf, ptr_from, len_from); > - nlen = len_from; > + nlen = min (*len_to, len_from); > + memcpy (ptr_to, ptr_from, nlen); if the *caller* checks for cp_to != cp_from, this memcpy could go away and the caller could just use the already existing buffer. Other than that, LGTM. Thanks, Corinna
[PATCH v3 0/2] Cygwin: pty: Changes regarding charset conversion.
Takashi Yano (2): Cygwin: pty: Revise convert_mb_str() function. Cygwin: pty: Fix input charset for non-cygwin apps with disable_pcon. winsup/cygwin/fhandler_tty.cc | 151 +++--- 1 file changed, 103 insertions(+), 48 deletions(-) -- 2.28.0
[PATCH v3 2/2] Cygwin: pty: Fix input charset for non-cygwin apps with disable_pcon.
- If the non-cygwin apps is executed under pseudo console disabled, multibyte input for the apps are garbled. This patch fixes the issue. --- winsup/cygwin/fhandler_tty.cc | 16 +++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc index 0bfc32ea9..1b021661f 100644 --- a/winsup/cygwin/fhandler_tty.cc +++ b/winsup/cygwin/fhandler_tty.cc @@ -295,8 +295,22 @@ fhandler_pty_master::accept_input () bytes_left = eat_readahead (-1); HANDLE write_to = get_output_handle (); + tmp_pathbuf tp; if (to_be_read_from_pcon ()) -write_to = to_slave; +{ + write_to = to_slave; + UINT cp_to = GetConsoleCP (); + if (get_ttyp ()->term_code_page != cp_to) + { + static mbstate_t mbp; + char *mbbuf = tp.c_get (); + size_t nlen = NT_MAX_PATH; + convert_mb_str (cp_to, mbbuf, &nlen, + get_ttyp ()->term_code_page, p, bytes_left, &mbp); + p = mbbuf; + bytes_left = nlen; + } +} if (!bytes_left) { -- 2.28.0
[PATCH v3 1/2] Cygwin: pty: Revise convert_mb_str() function.
- Use tmp_pathbuf instead of HeapAlloc()/HeapFree(). - Remove mb_str_free() function. - Consider the case where the multibyte string stops in the middle of a multibyte char. --- winsup/cygwin/fhandler_tty.cc | 135 ++ 1 file changed, 88 insertions(+), 47 deletions(-) diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc index 6de591d9b..0bfc32ea9 100644 --- a/winsup/cygwin/fhandler_tty.cc +++ b/winsup/cygwin/fhandler_tty.cc @@ -26,6 +26,7 @@ details. */ #include #include "cygwait.h" #include "registry.h" +#include "tls_pbuf.h" #ifndef PROC_THREAD_ATTRIBUTE_PSEUDOCONSOLE #define PROC_THREAD_ATTRIBUTE_PSEUDOCONSOLE 0x00020016 @@ -116,40 +117,63 @@ CreateProcessW_Hooked return CreateProcessW_Orig (n, c, pa, ta, inh, f, e, d, si, pi); } -static char * -convert_mb_str (UINT cp_to, size_t *len_to, - UINT cp_from, const char *ptr_from, size_t len_from) +static void +convert_mb_str (UINT cp_to, char *ptr_to, size_t *len_to, + UINT cp_from, const char *ptr_from, size_t len_from, + mbstate_t *mbp) { - char *buf; size_t nlen; - if (cp_to != cp_from) -{ - size_t wlen = - MultiByteToWideChar (cp_from, 0, ptr_from, len_from, NULL, 0); - wchar_t *wbuf = (wchar_t *) - HeapAlloc (GetProcessHeap (), 0, wlen * sizeof (wchar_t)); - wlen = - MultiByteToWideChar (cp_from, 0, ptr_from, len_from, wbuf, wlen); - nlen = WideCharToMultiByte (cp_to, 0, wbuf, wlen, NULL, 0, NULL, NULL); - buf = (char *) HeapAlloc (GetProcessHeap (), 0, nlen); - nlen = WideCharToMultiByte (cp_to, 0, wbuf, wlen, buf, nlen, NULL, NULL); - HeapFree (GetProcessHeap (), 0, wbuf); -} + tmp_pathbuf tp; + wchar_t *wbuf = tp.w_get (); + int wlen = 0; + if (cp_from == CP_UTF7) +/* MB_ERR_INVALID_CHARS does not work properly for UTF-7. + Therefore, just convert string without checking */ +wlen = MultiByteToWideChar (cp_from, 0, ptr_from, len_from, + wbuf, NT_MAX_PATH); else { - /* Just copy */ - buf = (char *) HeapAlloc (GetProcessHeap (), 0, len_from); - memcpy (buf, ptr_from, len_from); - nlen = len_from; + char *tmpbuf = tp.c_get (); + memcpy (tmpbuf, mbp->__value.__wchb, mbp->__count); + if (mbp->__count + len_from > NT_MAX_PATH) + len_from = NT_MAX_PATH - mbp->__count; + memcpy (tmpbuf + mbp->__count, ptr_from, len_from); + int total_len = mbp->__count + len_from; + mbp->__count = 0; + int mblen = 0; + for (const char *p = tmpbuf; p < tmpbuf + total_len; p += mblen) + /* Max bytes in multibyte char is 4. */ + for (mblen = 1; mblen <= 4; mblen ++) + { + /* Try conversion */ + int l = MultiByteToWideChar (cp_from, MB_ERR_INVALID_CHARS, +p, mblen, +wbuf + wlen, NT_MAX_PATH - wlen); + if (l) + { /* Conversion Success */ + wlen += l; + break; + } + else if (mblen == 4) + { /* Conversion Fail */ + l = MultiByteToWideChar (cp_from, 0, p, 1, +wbuf + wlen, NT_MAX_PATH - wlen); + wlen += l; + mblen = 1; + break; + } + else if (p + mblen == tmpbuf + total_len) + { /* Multibyte char incomplete */ + memcpy (mbp->__value.__wchb, p, mblen); + mbp->__count = mblen; + break; + } + /* Retry conversion with extended length */ + } } + nlen = WideCharToMultiByte (cp_to, 0, wbuf, wlen, + ptr_to, *len_to, NULL, NULL); *len_to = nlen; - return buf; -} - -static void -mb_str_free (char *ptr) -{ - HeapFree (GetProcessHeap (), 0, ptr); } static bool @@ -1613,9 +1637,18 @@ fhandler_pty_master::write (const void *ptr, size_t len) if current application is native console application. */ if (to_be_read_from_pcon () && get_ttyp ()->h_pseudo_console) { - size_t nlen; - char *buf = convert_mb_str (CP_UTF8, &nlen, get_ttyp ()->term_code_page, - (const char *) ptr, len); + tmp_pathbuf tp; + char *buf = (char *) ptr; + size_t nlen = len; + if (get_ttyp ()->term_code_page != CP_UTF8) + { + static mbstate_t mbp; + buf = tp.c_get (); + nlen = NT_MAX_PATH; + convert_mb_str (CP_UTF8, buf, &nlen, + get_ttyp ()->term_code_page, (const char *) ptr, len, + &mbp); + } WaitForSingleObject (input_mutex, INFINITE); @@ -1650,7 +1683,6 @@ fhandler_pty_master::write (const void *ptr, size_t len) get_ttyp ()->pcon_start = false; } ReleaseMutex (input_mutex); -
Re: [PATCH v3 0/2] Cygwin: pty: Changes regarding charset conversion.
On Sep 10 00:27, Takashi Yano via Cygwin-patches wrote: > Takashi Yano (2): > Cygwin: pty: Revise convert_mb_str() function. > Cygwin: pty: Fix input charset for non-cygwin apps with disable_pcon. > > winsup/cygwin/fhandler_tty.cc | 151 +++--- > 1 file changed, 103 insertions(+), 48 deletions(-) Pushed. Do we really have to support UTF-7? Is anybody actually using it? Usually Cygwin does not support UTF-7 at all. Thanks, Corinna
Re: [PATCH v3 0/2] Cygwin: pty: Changes regarding charset conversion.
On Wed, 9 Sep 2020 23:46:04 +0200 Corinna Vinschen wrote: > On Sep 10 00:27, Takashi Yano via Cygwin-patches wrote: > > Takashi Yano (2): > > Cygwin: pty: Revise convert_mb_str() function. > > Cygwin: pty: Fix input charset for non-cygwin apps with disable_pcon. > > > > winsup/cygwin/fhandler_tty.cc | 151 +++--- > > 1 file changed, 103 insertions(+), 48 deletions(-) > > Pushed. Thanks. > Do we really have to support UTF-7? Is anybody actually using it? > Usually Cygwin does not support UTF-7 at all. term_code_page can never be UTF-7, however, GetConsoleCP() and GetConsoleOutputCP() can be. Therefore, I think it is necessary for non-cygwin apps if pseudo console is disabled. -- Takashi Yano
Re: [PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"
Hi Corinna, On Wed, 9 Sep 2020 09:21:23 +0200 Corinna Vinschen wrote: > On Sep 8 17:16, Takashi Yano via Cygwin-patches wrote: > > On Mon, 7 Sep 2020 23:17:36 +0200 (CEST) > > Johannes Schindelin wrote: > > > Hi Takashi, > > > > > > On Sat, 5 Sep 2020, Takashi Yano wrote: > > > > > > > On Fri, 4 Sep 2020 08:23:42 +0200 (CEST) > > > > Johannes Schindelin wrote: > > > > > > > > > > On Fri, 4 Sep 2020, Takashi Yano via Cygwin-patches wrote: > > > > > > > > > > > On Tue, 1 Sep 2020 18:19:16 +0200 (CEST) > > > > > > Johannes Schindelin wrote: > > > > > > [...] > > > `LANG=en_US.UTF-8` (meaning your patches force their console applications' > > > output to be interpreted with code page 437) and therefore for those > > > users, things looked fine before, and now they don't. > > > > > > Note that I am not talking about developers who develop said console > > > applications. I am talking about users who use those console applications. > > > In other words, I am talking about a vastly larger group of affected > > > people. > > > > > > All of those people (or at least a substantial majority) will now have to > > > be told to please disable Pseudo Console support in v3.2.0 because they > > > would have to patch and rebuild those console applications that don't call > > > `SetConsoleOutputCP()`, and that is certainly unreasonable to expect of > > > the majority of users. Not even the `cmd /c chcp 65001` work-around (that > > > helps with v3.1.7) will work with v3.2.0 when Pseudo Console support is > > > enabled. > > > > In the case where pseudo console is disabled, the patch I proposed in > > https://cygwin.com/pipermail/cygwin-patches/2020q3/010548.html > > will solve the issue. I mean apps which work correctly in cygwin 3.0.7 > > work in cygwin 3.2.0 as well with that patch. > > > > OTOH, in the case where pseudo console is enabled, non-cygwin apps which > > work correctly in command prompt, work in cygwin 3.2.0 as well. > > > > It is enough for all, isn't it? > > > > You may think that all non-cygwin apps which work in cygwin 3.0.7 must > > work in cygwin 3.2.0 even when pseudo console is enabled, but it is > > against the purpose of the pseudo console support. The goal of pseudo > > console support is to make non-cygwin apps work as if it is executed in > > command prompt. > > > > By the way, you complained regarding garbled output of the program which > > outputs UTF-8 string regardless of locale. > > https://cygwin.com/pipermail/cygwin-developers/2020-August/011951.html > > However, many Japanese programmers, for example, make programs which > > output SJIS (CP932) string regardless of locale. > > > > Why do you think the former must work while the latter deos not have to? > > Is there any reasonable reason other than backward compatibility? > > Is that still a concern with the latest from master? There's > a snapshot for testing, Johannes. We are still discussing about that. https://github.com/msys2/MSYS2-packages/issues/1974 > Takashi, does the patch from > https://cygwin.com/pipermail/cygwin-developers/2020-August/011951.html > still apply to the latest from master? Question is, shouldn't the > Windows calls setting the codepage be only called if started from > child_info_spawn::worker for non-Cygwin executables? I'd propose the patch: diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc index 37d033bbe..95b28c3da 100644 --- a/winsup/cygwin/fhandler_tty.cc +++ b/winsup/cygwin/fhandler_tty.cc @@ -1830,7 +1830,11 @@ fhandler_pty_slave::setup_locale (void) extern UINT __eval_codepage_from_internal_charset (); if (!get_ttyp ()->term_code_page) -get_ttyp ()->term_code_page = __eval_codepage_from_internal_charset (); +{ + get_ttyp ()->term_code_page = __eval_codepage_from_internal_charset (); + SetConsoleCP (get_ttyp ()->term_code_page); + SetConsoleOutputCP (get_ttyp ()->term_code_page); +} } void However, Johannes insists setting codepage for non-cygwin apps even when pseudo console is enabled, which I cannot agree. Actually, I hesitate even the patch above, however, it seems to be necessary for msys apps in terms of backward compatibility. -- Takashi Yano