Re: [PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"

2020-09-09 Thread Corinna Vinschen
On Sep  8 17:16, Takashi Yano via Cygwin-patches wrote:
> On Mon, 7 Sep 2020 23:17:36 +0200 (CEST)
> Johannes Schindelin wrote:
> > Hi Takashi,
> > 
> > On Sat, 5 Sep 2020, Takashi Yano wrote:
> > 
> > > On Fri, 4 Sep 2020 08:23:42 +0200 (CEST)
> > > Johannes Schindelin wrote:
> > > >
> > > > On Fri, 4 Sep 2020, Takashi Yano via Cygwin-patches wrote:
> > > >
> > > > > On Tue, 1 Sep 2020 18:19:16 +0200 (CEST)
> > > > > Johannes Schindelin wrote:
> > > > > [...]
> > `LANG=en_US.UTF-8` (meaning your patches force their console applications'
> > output to be interpreted with code page 437) and therefore for those
> > users, things looked fine before, and now they don't.
> > 
> > Note that I am not talking about developers who develop said console
> > applications. I am talking about users who use those console applications.
> > In other words, I am talking about a vastly larger group of affected
> > people.
> > 
> > All of those people (or at least a substantial majority) will now have to
> > be told to please disable Pseudo Console support in v3.2.0 because they
> > would have to patch and rebuild those console applications that don't call
> > `SetConsoleOutputCP()`, and that is certainly unreasonable to expect of
> > the majority of users. Not even the `cmd /c chcp 65001` work-around (that
> > helps with v3.1.7) will work with v3.2.0 when Pseudo Console support is
> > enabled.
> 
> In the case where pseudo console is disabled, the patch I proposed in
> https://cygwin.com/pipermail/cygwin-patches/2020q3/010548.html
> will solve the issue. I mean apps which work correctly in cygwin 3.0.7
> work in cygwin 3.2.0 as well with that patch.
> 
> OTOH, in the case where pseudo console is enabled, non-cygwin apps which
> work correctly in command prompt, work in cygwin 3.2.0 as well.
> 
> It is enough for all, isn't it?
> 
> You may think that all non-cygwin apps which work in cygwin 3.0.7 must
> work in cygwin 3.2.0 even when pseudo console is enabled, but it is
> against the purpose of the pseudo console support. The goal of pseudo
> console support is to make non-cygwin apps work as if it is executed in
> command prompt.
> 
> By the way, you complained regarding garbled output of the program which
> outputs UTF-8 string regardless of locale.
> https://cygwin.com/pipermail/cygwin-developers/2020-August/011951.html
> However, many Japanese programmers, for example, make programs which
> output SJIS (CP932) string regardless of locale.
> 
> Why do you think the former must work while the latter deos not have to?
> Is there any reasonable reason other than backward compatibility?

Is that still a concern with the latest from master?  There's
a snapshot for testing, Johannes.

Takashi, does the patch from
https://cygwin.com/pipermail/cygwin-developers/2020-August/011951.html
still apply to the latest from master?  Question is, shouldn't the
Windows calls setting the codepage be only called if started from
child_info_spawn::worker for non-Cygwin executables?


Corinna


Re: [PATCH] Cygwin: pty: Fix input charset for non-cygwin apps with disable_pcon.

2020-09-09 Thread Takashi Yano via Cygwin-patches
Hi Corinna,

On Tue, 8 Sep 2020 20:42:47 +0200
Corinna Vinschen wrote:
> Hi Takashi,
> 
> On Sep  8 18:57, Takashi Yano via Cygwin-patches wrote:
> > - If the non-cygwin apps is executed under pseudo console disabled,
> >   multibyte input for the apps are garbled. This patch fixes the
> >   issue.
> > ---
> >  winsup/cygwin/fhandler_tty.cc | 13 -
> >  1 file changed, 12 insertions(+), 1 deletion(-)
> > 
> > diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc
> > index 6de591d9b..afaa4546e 100644
> > --- a/winsup/cygwin/fhandler_tty.cc
> > +++ b/winsup/cygwin/fhandler_tty.cc
> > @@ -271,8 +271,17 @@ fhandler_pty_master::accept_input ()
> >bytes_left = eat_readahead (-1);
> >  
> >HANDLE write_to = get_output_handle ();
> > +  char *buf = NULL;
> >if (to_be_read_from_pcon ())
> > -write_to = to_slave;
> > +{
> > +  write_to = to_slave;
> > +  size_t nlen;
> > +  buf = convert_mb_str (GetConsoleCP (), &nlen,
> > +   get_ttyp ()->term_code_page,
> > +   (const char *) p, bytes_left);
> > +  p = buf;
> > +  bytes_left = nlen;
> > +}
> 
> How big are chances that the string in p is larger than 32767 chars?
> 
> I'd like to see convert_mb_str use a tmp_pathbuf buffer instead of
> calling HeapAlloc/HeapFree every time.  That also drops the mb_str_free
> entirely.
> 
> Isn't there a problem anyway with calling convert_mb_str?  Consider
> a write call which stops in the middle of a multibyte char, the
> second half only sent with the next write call.  convert_mb_str
> only allows to convert complete multibyte chars, and the caller does
> not keep something like an mbstate_t around, which would allow
> continuation of split multibyte chars.

Thanks for the advice. I will submit a series of patches which
reflect your advice.

-- 
Takashi Yano 


[PATCH 2/2] Cygwin: pty: Fix input charset for non-cygwin apps with disable_pcon.

2020-09-09 Thread Takashi Yano via Cygwin-patches
- If the non-cygwin apps is executed under pseudo console disabled,
  multibyte input for the apps are garbled. This patch fixes the
  issue.
---
 winsup/cygwin/fhandler_tty.cc | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc
index e7485af72..7fa8c99e8 100644
--- a/winsup/cygwin/fhandler_tty.cc
+++ b/winsup/cygwin/fhandler_tty.cc
@@ -309,8 +309,18 @@ fhandler_pty_master::accept_input ()
   bytes_left = eat_readahead (-1);
 
   HANDLE write_to = get_output_handle ();
+  tmp_pathbuf tp;
+  char *mbbuf = tp.c_get ();
   if (to_be_read_from_pcon ())
-write_to = to_slave;
+{
+  write_to = to_slave;
+  static mbpend_t mbpend;
+  size_t nlen = NT_MAX_PATH;
+  convert_mb_str (GetConsoleCP (), mbbuf, &nlen,
+ get_ttyp ()->term_code_page, p, bytes_left, &mbpend);
+  p = mbbuf;
+  bytes_left = nlen;
+}
 
   if (!bytes_left)
 {
-- 
2.28.0



[PATCH 0/2] Cygwin: pty: Changes regarding charset conversion.

2020-09-09 Thread Takashi Yano via Cygwin-patches
Takashi Yano (2):
  Cygwin: pty: Revise convert_mb_str() function.
  Cygwin: pty: Fix input charset for non-cygwin apps with disable_pcon.

 winsup/cygwin/fhandler_tty.cc | 135 +++---
 1 file changed, 93 insertions(+), 42 deletions(-)

-- 
2.28.0



[PATCH 1/2] Cygwin: pty: Revise convert_mb_str() function.

2020-09-09 Thread Takashi Yano via Cygwin-patches
- Use tmp_pathbuf instead of HeapAlloc()/HeapFree().
- Remove mb_str_free() function.
- Consider the case where the multibyte string stops in the middle
  of a multibyte char.
---
 winsup/cygwin/fhandler_tty.cc | 123 ++
 1 file changed, 82 insertions(+), 41 deletions(-)

diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc
index 6de591d9b..e7485af72 100644
--- a/winsup/cygwin/fhandler_tty.cc
+++ b/winsup/cygwin/fhandler_tty.cc
@@ -26,6 +26,7 @@ details. */
 #include 
 #include "cygwait.h"
 #include "registry.h"
+#include "tls_pbuf.h"
 
 #ifndef PROC_THREAD_ATTRIBUTE_PSEUDOCONSOLE
 #define PROC_THREAD_ATTRIBUTE_PSEUDOCONSOLE 0x00020016
@@ -116,40 +117,77 @@ CreateProcessW_Hooked
   return CreateProcessW_Orig (n, c, pa, ta, inh, f, e, d, si, pi);
 }
 
-static char *
-convert_mb_str (UINT cp_to, size_t *len_to,
-   UINT cp_from, const char *ptr_from, size_t len_from)
+typedef struct {
+  int len;
+  char buf[5];
+} mbpend_t;
+
+static void
+convert_mb_str (UINT cp_to, char *ptr_to, size_t *len_to,
+   UINT cp_from, const char *ptr_from, size_t len_from,
+   mbpend_t *pend)
 {
-  char *buf;
   size_t nlen;
+  tmp_pathbuf tp;
   if (cp_to != cp_from)
 {
-  size_t wlen =
-   MultiByteToWideChar (cp_from, 0, ptr_from, len_from, NULL, 0);
-  wchar_t *wbuf = (wchar_t *)
-   HeapAlloc (GetProcessHeap (), 0, wlen * sizeof (wchar_t));
-  wlen =
-   MultiByteToWideChar (cp_from, 0, ptr_from, len_from, wbuf, wlen);
-  nlen = WideCharToMultiByte (cp_to, 0, wbuf, wlen, NULL, 0, NULL, NULL);
-  buf = (char *) HeapAlloc (GetProcessHeap (), 0, nlen);
-  nlen = WideCharToMultiByte (cp_to, 0, wbuf, wlen, buf, nlen, NULL, NULL);
-  HeapFree (GetProcessHeap (), 0, wbuf);
+  wchar_t *wbuf = tp.w_get ();
+  int wlen = 0;
+  if (cp_from == 65000) /* UTF-7 */
+   /* MB_ERR_INVALID_CHARS does not work properly for UTF-7.
+  Therefore, just convert string without checking */
+   wlen = MultiByteToWideChar (cp_from, 0, ptr_from, len_from,
+   wbuf, NT_MAX_PATH);
+  else
+   {
+ char *tmpbuf = tp.c_get ();
+ memcpy (tmpbuf, pend->buf, pend->len);
+ if (pend->len + len_from > NT_MAX_PATH)
+   len_from = NT_MAX_PATH - pend->len;
+ memcpy (tmpbuf + pend->len, ptr_from, len_from);
+ int total_len = pend->len + len_from;
+ pend->len = 0;
+ int mblen = 0;
+ for (const char *p = tmpbuf; p < tmpbuf + total_len; p += mblen)
+   /* Max bytes in multibyte char is 6. */
+   for (mblen = 1; mblen <= 6; mblen ++)
+ {
+   /* Try conversion */
+   int l = MultiByteToWideChar (cp_from, MB_ERR_INVALID_CHARS,
+p, mblen,
+wbuf + wlen, NT_MAX_PATH - wlen);
+   if (l)
+ { /* Conversion Success */
+   wlen += l;
+   break;
+ }
+   else if (mblen == 6)
+ { /* Conversion Fail */
+   l = MultiByteToWideChar (cp_from, 0, p, 1,
+wbuf + wlen, NT_MAX_PATH - wlen);
+   wlen += l;
+   mblen = 1;
+   break;
+ }
+   else if (p + mblen == tmpbuf + total_len)
+ { /* Multibyte char incomplete */
+   memcpy (pend->buf, p, mblen);
+   pend->len = mblen;
+   break;
+ }
+   /* Retry conversion with extended length */
+ }
+   }
+  nlen = WideCharToMultiByte (cp_to, 0, wbuf, wlen,
+ ptr_to, *len_to, NULL, NULL);
 }
   else
 {
   /* Just copy */
-  buf = (char *) HeapAlloc (GetProcessHeap (), 0, len_from);
-  memcpy (buf, ptr_from, len_from);
-  nlen = len_from;
+  nlen = min (*len_to, len_from);
+  memcpy (ptr_to, ptr_from, nlen);
 }
   *len_to = nlen;
-  return buf;
-}
-
-static void
-mb_str_free (char *ptr)
-{
-  HeapFree (GetProcessHeap (), 0, ptr);
 }
 
 static bool
@@ -1613,9 +1651,13 @@ fhandler_pty_master::write (const void *ptr, size_t len)
  if current application is native console application. */
   if (to_be_read_from_pcon () && get_ttyp ()->h_pseudo_console)
 {
-  size_t nlen;
-  char *buf = convert_mb_str (CP_UTF8, &nlen, get_ttyp ()->term_code_page,
- (const char *) ptr, len);
+  tmp_pathbuf tp;
+  char *buf = tp.c_get ();
+  static mbpend_t mbpend;
+  size_t nlen = NT_MAX_PATH;
+  convert_mb_str (CP_UTF8, buf, &nlen,
+ get_ttyp ()->term_code_page, (const char *) ptr, len,
+ &mbpend);
 
   WaitForSingleObject (input_mutex, INFINITE);
 
@

Re: [PATCH 1/2] Cygwin: pty: Revise convert_mb_str() function.

2020-09-09 Thread Takashi Yano via Cygwin-patches
On Wed,  9 Sep 2020 17:07:20 +0900
Takashi Yano wrote:
> + /* Max bytes in multibyte char is 6. */
> + for (mblen = 1; mblen <= 6; mblen ++)

Sorry, I misunderstood that the max utf-8 char length is 6.
Actually, it is 4, therefore mbstate_t can be used.

I will submit v2 patch.

-- 
Takashi Yano 


[PATCH v2 1/2] Cygwin: pty: Revise convert_mb_str() function.

2020-09-09 Thread Takashi Yano via Cygwin-patches
- Use tmp_pathbuf instead of HeapAlloc()/HeapFree().
- Remove mb_str_free() function.
- Consider the case where the multibyte string stops in the middle
  of a multibyte char.
---
 winsup/cygwin/fhandler_tty.cc | 118 ++
 1 file changed, 77 insertions(+), 41 deletions(-)

diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc
index 6de591d9b..c4b7fc61d 100644
--- a/winsup/cygwin/fhandler_tty.cc
+++ b/winsup/cygwin/fhandler_tty.cc
@@ -26,6 +26,7 @@ details. */
 #include 
 #include "cygwait.h"
 #include "registry.h"
+#include "tls_pbuf.h"
 
 #ifndef PROC_THREAD_ATTRIBUTE_PSEUDOCONSOLE
 #define PROC_THREAD_ATTRIBUTE_PSEUDOCONSOLE 0x00020016
@@ -116,40 +117,72 @@ CreateProcessW_Hooked
   return CreateProcessW_Orig (n, c, pa, ta, inh, f, e, d, si, pi);
 }
 
-static char *
-convert_mb_str (UINT cp_to, size_t *len_to,
-   UINT cp_from, const char *ptr_from, size_t len_from)
+static void
+convert_mb_str (UINT cp_to, char *ptr_to, size_t *len_to,
+   UINT cp_from, const char *ptr_from, size_t len_from,
+   mbstate_t *stat)
 {
-  char *buf;
   size_t nlen;
+  tmp_pathbuf tp;
   if (cp_to != cp_from)
 {
-  size_t wlen =
-   MultiByteToWideChar (cp_from, 0, ptr_from, len_from, NULL, 0);
-  wchar_t *wbuf = (wchar_t *)
-   HeapAlloc (GetProcessHeap (), 0, wlen * sizeof (wchar_t));
-  wlen =
-   MultiByteToWideChar (cp_from, 0, ptr_from, len_from, wbuf, wlen);
-  nlen = WideCharToMultiByte (cp_to, 0, wbuf, wlen, NULL, 0, NULL, NULL);
-  buf = (char *) HeapAlloc (GetProcessHeap (), 0, nlen);
-  nlen = WideCharToMultiByte (cp_to, 0, wbuf, wlen, buf, nlen, NULL, NULL);
-  HeapFree (GetProcessHeap (), 0, wbuf);
+  wchar_t *wbuf = tp.w_get ();
+  int wlen = 0;
+  if (cp_from == 65000) /* UTF-7 */
+   /* MB_ERR_INVALID_CHARS does not work properly for UTF-7.
+  Therefore, just convert string without checking */
+   wlen = MultiByteToWideChar (cp_from, 0, ptr_from, len_from,
+   wbuf, NT_MAX_PATH);
+  else
+   {
+ char *tmpbuf = tp.c_get ();
+ memcpy (tmpbuf, stat->__value.__wchb, stat->__count);
+ if (stat->__count + len_from > NT_MAX_PATH)
+   len_from = NT_MAX_PATH - stat->__count;
+ memcpy (tmpbuf + stat->__count, ptr_from, len_from);
+ int total_len = stat->__count + len_from;
+ stat->__count = 0;
+ int mblen = 0;
+ for (const char *p = tmpbuf; p < tmpbuf + total_len; p += mblen)
+   /* Max bytes in multibyte char is 4. */
+   for (mblen = 1; mblen <= 4; mblen ++)
+ {
+   /* Try conversion */
+   int l = MultiByteToWideChar (cp_from, MB_ERR_INVALID_CHARS,
+p, mblen,
+wbuf + wlen, NT_MAX_PATH - wlen);
+   if (l)
+ { /* Conversion Success */
+   wlen += l;
+   break;
+ }
+   else if (mblen == 4)
+ { /* Conversion Fail */
+   l = MultiByteToWideChar (cp_from, 0, p, 1,
+wbuf + wlen, NT_MAX_PATH - wlen);
+   wlen += l;
+   mblen = 1;
+   break;
+ }
+   else if (p + mblen == tmpbuf + total_len)
+ { /* Multibyte char incomplete */
+   memcpy (stat->__value.__wchb, p, mblen);
+   stat->__count = mblen;
+   break;
+ }
+   /* Retry conversion with extended length */
+ }
+   }
+  nlen = WideCharToMultiByte (cp_to, 0, wbuf, wlen,
+ ptr_to, *len_to, NULL, NULL);
 }
   else
 {
   /* Just copy */
-  buf = (char *) HeapAlloc (GetProcessHeap (), 0, len_from);
-  memcpy (buf, ptr_from, len_from);
-  nlen = len_from;
+  nlen = min (*len_to, len_from);
+  memcpy (ptr_to, ptr_from, nlen);
 }
   *len_to = nlen;
-  return buf;
-}
-
-static void
-mb_str_free (char *ptr)
-{
-  HeapFree (GetProcessHeap (), 0, ptr);
 }
 
 static bool
@@ -1613,9 +1646,13 @@ fhandler_pty_master::write (const void *ptr, size_t len)
  if current application is native console application. */
   if (to_be_read_from_pcon () && get_ttyp ()->h_pseudo_console)
 {
-  size_t nlen;
-  char *buf = convert_mb_str (CP_UTF8, &nlen, get_ttyp ()->term_code_page,
- (const char *) ptr, len);
+  tmp_pathbuf tp;
+  char *buf = tp.c_get ();
+  static mbstate_t mbstat;
+  size_t nlen = NT_MAX_PATH;
+  convert_mb_str (CP_UTF8, buf, &nlen,
+ get_ttyp ()->term_code_page, (const char *) ptr, len,
+ &mbstat);
 
   WaitForSingleObject (input_mutex, INFINITE);
 
@@ -1650,7

[PATCH v2 2/2] Cygwin: pty: Fix input charset for non-cygwin apps with disable_pcon.

2020-09-09 Thread Takashi Yano via Cygwin-patches
- If the non-cygwin apps is executed under pseudo console disabled,
  multibyte input for the apps are garbled. This patch fixes the
  issue.
---
 winsup/cygwin/fhandler_tty.cc | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc
index c4b7fc61d..701381ea6 100644
--- a/winsup/cygwin/fhandler_tty.cc
+++ b/winsup/cygwin/fhandler_tty.cc
@@ -304,8 +304,18 @@ fhandler_pty_master::accept_input ()
   bytes_left = eat_readahead (-1);
 
   HANDLE write_to = get_output_handle ();
+  tmp_pathbuf tp;
+  char *mbbuf = tp.c_get ();
   if (to_be_read_from_pcon ())
-write_to = to_slave;
+{
+  write_to = to_slave;
+  static mbstate_t mbstat;
+  size_t nlen = NT_MAX_PATH;
+  convert_mb_str (GetConsoleCP (), mbbuf, &nlen,
+ get_ttyp ()->term_code_page, p, bytes_left, &mbstat);
+  p = mbbuf;
+  bytes_left = nlen;
+}
 
   if (!bytes_left)
 {
-- 
2.28.0



[PATCH v2 0/2] Cygwin: pty: Changes regarding charset conversion.

2020-09-09 Thread Takashi Yano via Cygwin-patches
Takashi Yano (2):
  Cygwin: pty: Revise convert_mb_str() function.
  Cygwin: pty: Fix input charset for non-cygwin apps with disable_pcon.

 winsup/cygwin/fhandler_tty.cc | 130 +++---
 1 file changed, 88 insertions(+), 42 deletions(-)

-- 
2.28.0



Re: [PATCH v2 1/2] Cygwin: pty: Revise convert_mb_str() function.

2020-09-09 Thread Corinna Vinschen
On Sep  9 22:40, Takashi Yano via Cygwin-patches wrote:
> - Use tmp_pathbuf instead of HeapAlloc()/HeapFree().
> - Remove mb_str_free() function.
> - Consider the case where the multibyte string stops in the middle
>   of a multibyte char.
> ---
>  winsup/cygwin/fhandler_tty.cc | 118 ++
>  1 file changed, 77 insertions(+), 41 deletions(-)
> 
> diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc
> index 6de591d9b..c4b7fc61d 100644
> --- a/winsup/cygwin/fhandler_tty.cc
> +++ b/winsup/cygwin/fhandler_tty.cc
> @@ -26,6 +26,7 @@ details. */
>  #include 
>  #include "cygwait.h"
>  #include "registry.h"
> +#include "tls_pbuf.h"
>  
>  #ifndef PROC_THREAD_ATTRIBUTE_PSEUDOCONSOLE
>  #define PROC_THREAD_ATTRIBUTE_PSEUDOCONSOLE 0x00020016
> @@ -116,40 +117,72 @@ CreateProcessW_Hooked
>return CreateProcessW_Orig (n, c, pa, ta, inh, f, e, d, si, pi);
>  }
>  
> -static char *
> -convert_mb_str (UINT cp_to, size_t *len_to,
> - UINT cp_from, const char *ptr_from, size_t len_from)
> +static void
> +convert_mb_str (UINT cp_to, char *ptr_to, size_t *len_to,
> + UINT cp_from, const char *ptr_from, size_t len_from,
> + mbstate_t *stat)

Heh, it's funny to use mbstate_t together with Windows functions :)
However, the var name `stat' makes me itch.  It looks too close to
struct stat and function stat.  What about "mbp" or something along
these lines?

>  {
> -  char *buf;
>size_t nlen;
> +  tmp_pathbuf tp;
>if (cp_to != cp_from)
>  {
> -  size_t wlen =
> - MultiByteToWideChar (cp_from, 0, ptr_from, len_from, NULL, 0);
> -  wchar_t *wbuf = (wchar_t *)
> - HeapAlloc (GetProcessHeap (), 0, wlen * sizeof (wchar_t));
> -  wlen =
> - MultiByteToWideChar (cp_from, 0, ptr_from, len_from, wbuf, wlen);
> -  nlen = WideCharToMultiByte (cp_to, 0, wbuf, wlen, NULL, 0, NULL, NULL);
> -  buf = (char *) HeapAlloc (GetProcessHeap (), 0, nlen);
> -  nlen = WideCharToMultiByte (cp_to, 0, wbuf, wlen, buf, nlen, NULL, 
> NULL);
> -  HeapFree (GetProcessHeap (), 0, wbuf);
> +  wchar_t *wbuf = tp.w_get ();
> +  int wlen = 0;
> +  if (cp_from == 65000) /* UTF-7 */

CP_UTF7?

> + /* MB_ERR_INVALID_CHARS does not work properly for UTF-7.
> +Therefore, just convert string without checking */
> + wlen = MultiByteToWideChar (cp_from, 0, ptr_from, len_from,
> + wbuf, NT_MAX_PATH);
> +  else
> + {
> +   char *tmpbuf = tp.c_get ();
> +   memcpy (tmpbuf, stat->__value.__wchb, stat->__count);
> +   if (stat->__count + len_from > NT_MAX_PATH)
> + len_from = NT_MAX_PATH - stat->__count;
> +   memcpy (tmpbuf + stat->__count, ptr_from, len_from);
> +   int total_len = stat->__count + len_from;
> +   stat->__count = 0;
> +   int mblen = 0;
> +   for (const char *p = tmpbuf; p < tmpbuf + total_len; p += mblen)
> + /* Max bytes in multibyte char is 4. */
> + for (mblen = 1; mblen <= 4; mblen ++)
> +   {
> + /* Try conversion */
> + int l = MultiByteToWideChar (cp_from, MB_ERR_INVALID_CHARS,
> +  p, mblen,
> +  wbuf + wlen, NT_MAX_PATH - wlen);
> + if (l)
> +   { /* Conversion Success */
> + wlen += l;
> + break;
> +   }
> + else if (mblen == 4)
> +   { /* Conversion Fail */
> + l = MultiByteToWideChar (cp_from, 0, p, 1,
> +  wbuf + wlen, NT_MAX_PATH - wlen);
> + wlen += l;
> + mblen = 1;
> + break;
> +   }
> + else if (p + mblen == tmpbuf + total_len)
> +   { /* Multibyte char incomplete */
> + memcpy (stat->__value.__wchb, p, mblen);
> + stat->__count = mblen;
> + break;
> +   }
> + /* Retry conversion with extended length */
> +   }
> + }
> +  nlen = WideCharToMultiByte (cp_to, 0, wbuf, wlen,
> +   ptr_to, *len_to, NULL, NULL);
>  }
>else
>  {
>/* Just copy */
> -  buf = (char *) HeapAlloc (GetProcessHeap (), 0, len_from);
> -  memcpy (buf, ptr_from, len_from);
> -  nlen = len_from;
> +  nlen = min (*len_to, len_from);
> +  memcpy (ptr_to, ptr_from, nlen);

if the *caller* checks for cp_to != cp_from, this memcpy could go
away and the caller could just use the already existing buffer.

Other than that, LGTM.


Thanks,
Corinna


[PATCH v3 0/2] Cygwin: pty: Changes regarding charset conversion.

2020-09-09 Thread Takashi Yano via Cygwin-patches
Takashi Yano (2):
  Cygwin: pty: Revise convert_mb_str() function.
  Cygwin: pty: Fix input charset for non-cygwin apps with disable_pcon.

 winsup/cygwin/fhandler_tty.cc | 151 +++---
 1 file changed, 103 insertions(+), 48 deletions(-)

-- 
2.28.0



[PATCH v3 2/2] Cygwin: pty: Fix input charset for non-cygwin apps with disable_pcon.

2020-09-09 Thread Takashi Yano via Cygwin-patches
- If the non-cygwin apps is executed under pseudo console disabled,
  multibyte input for the apps are garbled. This patch fixes the
  issue.
---
 winsup/cygwin/fhandler_tty.cc | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc
index 0bfc32ea9..1b021661f 100644
--- a/winsup/cygwin/fhandler_tty.cc
+++ b/winsup/cygwin/fhandler_tty.cc
@@ -295,8 +295,22 @@ fhandler_pty_master::accept_input ()
   bytes_left = eat_readahead (-1);
 
   HANDLE write_to = get_output_handle ();
+  tmp_pathbuf tp;
   if (to_be_read_from_pcon ())
-write_to = to_slave;
+{
+  write_to = to_slave;
+  UINT cp_to = GetConsoleCP ();
+  if (get_ttyp ()->term_code_page != cp_to)
+   {
+ static mbstate_t mbp;
+ char *mbbuf = tp.c_get ();
+ size_t nlen = NT_MAX_PATH;
+ convert_mb_str (cp_to, mbbuf, &nlen,
+ get_ttyp ()->term_code_page, p, bytes_left, &mbp);
+ p = mbbuf;
+ bytes_left = nlen;
+   }
+}
 
   if (!bytes_left)
 {
-- 
2.28.0



[PATCH v3 1/2] Cygwin: pty: Revise convert_mb_str() function.

2020-09-09 Thread Takashi Yano via Cygwin-patches
- Use tmp_pathbuf instead of HeapAlloc()/HeapFree().
- Remove mb_str_free() function.
- Consider the case where the multibyte string stops in the middle
  of a multibyte char.
---
 winsup/cygwin/fhandler_tty.cc | 135 ++
 1 file changed, 88 insertions(+), 47 deletions(-)

diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc
index 6de591d9b..0bfc32ea9 100644
--- a/winsup/cygwin/fhandler_tty.cc
+++ b/winsup/cygwin/fhandler_tty.cc
@@ -26,6 +26,7 @@ details. */
 #include 
 #include "cygwait.h"
 #include "registry.h"
+#include "tls_pbuf.h"
 
 #ifndef PROC_THREAD_ATTRIBUTE_PSEUDOCONSOLE
 #define PROC_THREAD_ATTRIBUTE_PSEUDOCONSOLE 0x00020016
@@ -116,40 +117,63 @@ CreateProcessW_Hooked
   return CreateProcessW_Orig (n, c, pa, ta, inh, f, e, d, si, pi);
 }
 
-static char *
-convert_mb_str (UINT cp_to, size_t *len_to,
-   UINT cp_from, const char *ptr_from, size_t len_from)
+static void
+convert_mb_str (UINT cp_to, char *ptr_to, size_t *len_to,
+   UINT cp_from, const char *ptr_from, size_t len_from,
+   mbstate_t *mbp)
 {
-  char *buf;
   size_t nlen;
-  if (cp_to != cp_from)
-{
-  size_t wlen =
-   MultiByteToWideChar (cp_from, 0, ptr_from, len_from, NULL, 0);
-  wchar_t *wbuf = (wchar_t *)
-   HeapAlloc (GetProcessHeap (), 0, wlen * sizeof (wchar_t));
-  wlen =
-   MultiByteToWideChar (cp_from, 0, ptr_from, len_from, wbuf, wlen);
-  nlen = WideCharToMultiByte (cp_to, 0, wbuf, wlen, NULL, 0, NULL, NULL);
-  buf = (char *) HeapAlloc (GetProcessHeap (), 0, nlen);
-  nlen = WideCharToMultiByte (cp_to, 0, wbuf, wlen, buf, nlen, NULL, NULL);
-  HeapFree (GetProcessHeap (), 0, wbuf);
-}
+  tmp_pathbuf tp;
+  wchar_t *wbuf = tp.w_get ();
+  int wlen = 0;
+  if (cp_from == CP_UTF7)
+/* MB_ERR_INVALID_CHARS does not work properly for UTF-7.
+   Therefore, just convert string without checking */
+wlen = MultiByteToWideChar (cp_from, 0, ptr_from, len_from,
+   wbuf, NT_MAX_PATH);
   else
 {
-  /* Just copy */
-  buf = (char *) HeapAlloc (GetProcessHeap (), 0, len_from);
-  memcpy (buf, ptr_from, len_from);
-  nlen = len_from;
+  char *tmpbuf = tp.c_get ();
+  memcpy (tmpbuf, mbp->__value.__wchb, mbp->__count);
+  if (mbp->__count + len_from > NT_MAX_PATH)
+   len_from = NT_MAX_PATH - mbp->__count;
+  memcpy (tmpbuf + mbp->__count, ptr_from, len_from);
+  int total_len = mbp->__count + len_from;
+  mbp->__count = 0;
+  int mblen = 0;
+  for (const char *p = tmpbuf; p < tmpbuf + total_len; p += mblen)
+   /* Max bytes in multibyte char is 4. */
+   for (mblen = 1; mblen <= 4; mblen ++)
+ {
+   /* Try conversion */
+   int l = MultiByteToWideChar (cp_from, MB_ERR_INVALID_CHARS,
+p, mblen,
+wbuf + wlen, NT_MAX_PATH - wlen);
+   if (l)
+ { /* Conversion Success */
+   wlen += l;
+   break;
+ }
+   else if (mblen == 4)
+ { /* Conversion Fail */
+   l = MultiByteToWideChar (cp_from, 0, p, 1,
+wbuf + wlen, NT_MAX_PATH - wlen);
+   wlen += l;
+   mblen = 1;
+   break;
+ }
+   else if (p + mblen == tmpbuf + total_len)
+ { /* Multibyte char incomplete */
+   memcpy (mbp->__value.__wchb, p, mblen);
+   mbp->__count = mblen;
+   break;
+ }
+   /* Retry conversion with extended length */
+ }
 }
+  nlen = WideCharToMultiByte (cp_to, 0, wbuf, wlen,
+ ptr_to, *len_to, NULL, NULL);
   *len_to = nlen;
-  return buf;
-}
-
-static void
-mb_str_free (char *ptr)
-{
-  HeapFree (GetProcessHeap (), 0, ptr);
 }
 
 static bool
@@ -1613,9 +1637,18 @@ fhandler_pty_master::write (const void *ptr, size_t len)
  if current application is native console application. */
   if (to_be_read_from_pcon () && get_ttyp ()->h_pseudo_console)
 {
-  size_t nlen;
-  char *buf = convert_mb_str (CP_UTF8, &nlen, get_ttyp ()->term_code_page,
- (const char *) ptr, len);
+  tmp_pathbuf tp;
+  char *buf = (char *) ptr;
+  size_t nlen = len;
+  if (get_ttyp ()->term_code_page != CP_UTF8)
+   {
+ static mbstate_t mbp;
+ buf = tp.c_get ();
+ nlen = NT_MAX_PATH;
+ convert_mb_str (CP_UTF8, buf, &nlen,
+ get_ttyp ()->term_code_page, (const char *) ptr, len,
+ &mbp);
+   }
 
   WaitForSingleObject (input_mutex, INFINITE);
 
@@ -1650,7 +1683,6 @@ fhandler_pty_master::write (const void *ptr, size_t len)
  get_ttyp ()->pcon_start = false;
}
  ReleaseMutex (input_mutex);
-

Re: [PATCH v3 0/2] Cygwin: pty: Changes regarding charset conversion.

2020-09-09 Thread Corinna Vinschen
On Sep 10 00:27, Takashi Yano via Cygwin-patches wrote:
> Takashi Yano (2):
>   Cygwin: pty: Revise convert_mb_str() function.
>   Cygwin: pty: Fix input charset for non-cygwin apps with disable_pcon.
> 
>  winsup/cygwin/fhandler_tty.cc | 151 +++---
>  1 file changed, 103 insertions(+), 48 deletions(-)

Pushed.

Do we really have to support UTF-7?  Is anybody actually using it?
Usually Cygwin does not support UTF-7 at all.


Thanks,
Corinna


Re: [PATCH v3 0/2] Cygwin: pty: Changes regarding charset conversion.

2020-09-09 Thread Takashi Yano via Cygwin-patches
On Wed, 9 Sep 2020 23:46:04 +0200
Corinna Vinschen wrote:
> On Sep 10 00:27, Takashi Yano via Cygwin-patches wrote:
> > Takashi Yano (2):
> >   Cygwin: pty: Revise convert_mb_str() function.
> >   Cygwin: pty: Fix input charset for non-cygwin apps with disable_pcon.
> > 
> >  winsup/cygwin/fhandler_tty.cc | 151 +++---
> >  1 file changed, 103 insertions(+), 48 deletions(-)
> 
> Pushed.

Thanks.

> Do we really have to support UTF-7?  Is anybody actually using it?
> Usually Cygwin does not support UTF-7 at all.

term_code_page can never be UTF-7, however, GetConsoleCP() and
GetConsoleOutputCP() can be. Therefore, I think it is necessary
for non-cygwin apps if pseudo console is disabled.

-- 
Takashi Yano 


Re: [PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"

2020-09-09 Thread Takashi Yano via Cygwin-patches
Hi Corinna,

On Wed, 9 Sep 2020 09:21:23 +0200
Corinna Vinschen wrote:
> On Sep  8 17:16, Takashi Yano via Cygwin-patches wrote:
> > On Mon, 7 Sep 2020 23:17:36 +0200 (CEST)
> > Johannes Schindelin wrote:
> > > Hi Takashi,
> > > 
> > > On Sat, 5 Sep 2020, Takashi Yano wrote:
> > > 
> > > > On Fri, 4 Sep 2020 08:23:42 +0200 (CEST)
> > > > Johannes Schindelin wrote:
> > > > >
> > > > > On Fri, 4 Sep 2020, Takashi Yano via Cygwin-patches wrote:
> > > > >
> > > > > > On Tue, 1 Sep 2020 18:19:16 +0200 (CEST)
> > > > > > Johannes Schindelin wrote:
> > > > > > [...]
> > > `LANG=en_US.UTF-8` (meaning your patches force their console applications'
> > > output to be interpreted with code page 437) and therefore for those
> > > users, things looked fine before, and now they don't.
> > > 
> > > Note that I am not talking about developers who develop said console
> > > applications. I am talking about users who use those console applications.
> > > In other words, I am talking about a vastly larger group of affected
> > > people.
> > > 
> > > All of those people (or at least a substantial majority) will now have to
> > > be told to please disable Pseudo Console support in v3.2.0 because they
> > > would have to patch and rebuild those console applications that don't call
> > > `SetConsoleOutputCP()`, and that is certainly unreasonable to expect of
> > > the majority of users. Not even the `cmd /c chcp 65001` work-around (that
> > > helps with v3.1.7) will work with v3.2.0 when Pseudo Console support is
> > > enabled.
> > 
> > In the case where pseudo console is disabled, the patch I proposed in
> > https://cygwin.com/pipermail/cygwin-patches/2020q3/010548.html
> > will solve the issue. I mean apps which work correctly in cygwin 3.0.7
> > work in cygwin 3.2.0 as well with that patch.
> > 
> > OTOH, in the case where pseudo console is enabled, non-cygwin apps which
> > work correctly in command prompt, work in cygwin 3.2.0 as well.
> > 
> > It is enough for all, isn't it?
> > 
> > You may think that all non-cygwin apps which work in cygwin 3.0.7 must
> > work in cygwin 3.2.0 even when pseudo console is enabled, but it is
> > against the purpose of the pseudo console support. The goal of pseudo
> > console support is to make non-cygwin apps work as if it is executed in
> > command prompt.
> > 
> > By the way, you complained regarding garbled output of the program which
> > outputs UTF-8 string regardless of locale.
> > https://cygwin.com/pipermail/cygwin-developers/2020-August/011951.html
> > However, many Japanese programmers, for example, make programs which
> > output SJIS (CP932) string regardless of locale.
> > 
> > Why do you think the former must work while the latter deos not have to?
> > Is there any reasonable reason other than backward compatibility?
> 
> Is that still a concern with the latest from master?  There's
> a snapshot for testing, Johannes.

We are still discussing about that.
https://github.com/msys2/MSYS2-packages/issues/1974

> Takashi, does the patch from
> https://cygwin.com/pipermail/cygwin-developers/2020-August/011951.html
> still apply to the latest from master?  Question is, shouldn't the
> Windows calls setting the codepage be only called if started from
> child_info_spawn::worker for non-Cygwin executables?

I'd propose the patch:

diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc
index 37d033bbe..95b28c3da 100644
--- a/winsup/cygwin/fhandler_tty.cc
+++ b/winsup/cygwin/fhandler_tty.cc
@@ -1830,7 +1830,11 @@ fhandler_pty_slave::setup_locale (void)
   extern UINT __eval_codepage_from_internal_charset ();

   if (!get_ttyp ()->term_code_page)
-get_ttyp ()->term_code_page = __eval_codepage_from_internal_charset ();
+{
+  get_ttyp ()->term_code_page = __eval_codepage_from_internal_charset ();
+  SetConsoleCP (get_ttyp ()->term_code_page);
+  SetConsoleOutputCP (get_ttyp ()->term_code_page);
+}
 }

 void

However, Johannes insists setting codepage for non-cygwin apps
even when pseudo console is enabled, which I cannot agree.

Actually, I hesitate even the patch above, however, it seems to
be necessary for msys apps in terms of backward compatibility.

-- 
Takashi Yano