Re: [PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"

2020-09-04 Thread Takashi Yano via Cygwin-patches
Hi Corinna,

On Thu, 3 Sep 2020 19:59:12 +0200
Corinna Vinschen wrote:
> On Sep  2 18:38, Corinna Vinschen wrote:
> > Hi Takashi,
> > 
> > On Sep  3 01:25, Takashi Yano via Cygwin-patches wrote:
> > > Hi Corinna,
> > > 
> > > On Wed, 2 Sep 2020 17:24:50 +0200
> > > Corinna Vinschen  wrote:
> > > > > > get_locale_from_env() and get_langinfo() should go away.  If we just
> > > > > > need a codepage for get_ttyp ()->term_code_page, we should really 
> > > > > > find a
> > > > > > way to do this from within internal_setlocale().
> > > > > 
> > > > > I looked into internal_setlocale() code, but I could not found
> > > > > the code which handles thecode page. I found the code handling
> > > > > the code page in __set_charset_from_locale() function in nlsfuncs.cc,
> > > > > but it does not return code page itself. Could you please explain
> > > > > more detail of your idea?
> > > > 
> > > > I had none yet :)  I was just musing, without actually thinking about a
> > > > solution.  But I think this isn't very complicated.  Given this is
> > > > inside Cygwin, nothing keeps the function to have a well-defined
> > > > side-effect, as in setting a (not yet existing) member "term_code_page"
> > > > of cygheap->locale.
> > > > 
> > > > Kind of like this:
> > > > [...]
> > > I have tried your code, however, it does not work as expected.
> > > It seems that __set_charset_from_locale() is not called.
> > > cygheap->locale.term_code_page is always 0.
> > 
> > Ah, right!  Take a look into newlib/libc/locale/locale.c, function
> > __loadlocale().  This function is called from _setlocale_r().  However,
> > it calls __set_charset_from_locale() *only* if the charset isn't already
> > given explicitely in the LC_* or LANG string, because otherwise we
> > already know the charset, after all.
> > 
> > Darn!  That foils my plans for world domination...
> > 
> > > Let me consider a while.
> > 
> > Thanks, I'll do the same.  I'd really like to simplify this stuff
> > and doing the locale shuffle in two entirely different locations
> > at different times is prone to getting out of sync.
> 
> The only idea I had so far was, changing the way __set_charset_from_locale
> works from within _setlocale_r:
> 
> We could add a Cygwin-specific function only fetching the codepage and
> call it unconditionally from _setlocale_r.  __set_charset_from_locale is
> called with a new parameter "codepage", so it doesn't have to fetch the
> CP by itself, but it's still only called from _setlocale_r if necessary.
> 
> Would that be sufficient?  The CP conversion from 20127/ASCII to 65001/UTF8
> could be done at the point the codepage is actually required.

I think I have found the answer to your request.
Patch attached. What do you think of this patch?

Calling initial_setlocale() is necessary because
nl_langinfo() always returns "ANSI_X3.4-1968"
regardless locale setting if this is not called.

-- 
Takashi Yano 


0001-Cygwin-pty-Replace-pty-specific-locale-functions-wit.patch
Description: Binary data


Re: [PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"

2020-09-04 Thread Takashi Yano via Cygwin-patches
Hi Johannes and Corinna,

On Tue, 1 Sep 2020 18:19:16 +0200 (CEST)
Johannes Schindelin wrote:

> When `LANG=en_US.UTF-8`, the detected `LCID` is 0x0409, which is
> correct, but after that (at least if Pseudo Console support is enabled),
> we try to find the default code page for that `LCID`, which is ASCII
> (437). Subsequently, we set the Console output code page to that value,
> completely ignoring that we wanted to use UTF-8.
> 
> Let's not ignore the specifically asked-for UTF-8 character set.
> 
> While at it, let's also set the Console output code page even if Pseudo
> Console support is disabled; contrary to the behavior of v3.0.7, the
> Console output code page is not ignored in that case.
> 
> The most common symptom would be that console applications which do not
> specifically call `SetConsoleOutputCP()` but output UTF-8-encoded text
> seem to be broken with v3.1.x when they worked plenty fine with v3.0.x.
> 
> This fixes https://github.com/msys2/MSYS2-packages/issues/1974,
> https://github.com/msys2/MSYS2-packages/issues/2012,
> https://github.com/rust-lang/cargo/issues/8369,
> https://github.com/git-for-windows/git/issues/2734,
> https://github.com/git-for-windows/git/issues/2793,
> https://github.com/git-for-windows/git/issues/2792, and possibly quite a
> few others.
> 
> Signed-off-by: Johannes Schindelin 
> ---
>  winsup/cygwin/fhandler_tty.cc | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc
> index 06789a500..414c26992 100644
> --- a/winsup/cygwin/fhandler_tty.cc
> +++ b/winsup/cygwin/fhandler_tty.cc
> @@ -2859,6 +2859,15 @@ fhandler_pty_slave::setup_locale (void)
>char charset[ENCODING_LEN + 1] = "ASCII";
>LCID lcid = get_langinfo (locale, charset);
> 
> +  /* Special-case the UTF-8 character set */
> +  if (strcasecmp (charset, "UTF-8") == 0)
> +{
> +  get_ttyp ()->term_code_page = CP_UTF8;
> +  SetConsoleCP (CP_UTF8);
> +  SetConsoleOutputCP (CP_UTF8);
> +  return;
> +}
> +
>/* Set console code page from locale */
>if (get_pseudo_console ())
>  {
> --
> 2.27.0

I would like to propose a counter patch attached.
What do you think of this patch?

This patch does not treat UTF-8 as special.

-- 
Takashi Yano 


0001-Cygwin-pty-Prevent-garbled-output-when-pseudo-consol.patch
Description: Binary data


[PATCH] Cygwin: create install dir for libs

2020-09-04 Thread David McFarland via Cygwin-patches
This fixes a race in parallel installs.
---
 winsup/cygwin/Makefile.in | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/winsup/cygwin/Makefile.in b/winsup/cygwin/Makefile.in
index fac81759e..ea0243033 100644
--- a/winsup/cygwin/Makefile.in
+++ b/winsup/cygwin/Makefile.in
@@ -600,7 +600,7 @@ install: install-libs install-headers install-man 
install-ldif install_target \
 uninstall: uninstall-libs uninstall-headers uninstall-man
 
 install-libs: $(TARGET_LIBS)
-   @$(MKDIRP) $(DESTDIR)$(bindir)
+   @$(MKDIRP) $(DESTDIR)$(bindir) $(DESTDIR)$(tooldir)/lib
$(INSTALL_PROGRAM) $(TEST_DLL_NAME) $(DESTDIR)$(bindir)/$(DLL_NAME); \
for i in $^; do \
$(INSTALL_DATA) $$i $(DESTDIR)$(tooldir)/lib/`basename $$i` ; \
-- 
2.28.0



Re: [PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"

2020-09-04 Thread Corinna Vinschen
Hi Takashi,

On Sep  4 18:21, Takashi Yano via Cygwin-patches wrote:
> Hi Corinna,
> 
> On Thu, 3 Sep 2020 19:59:12 +0200
> Corinna Vinschen wrote:
> > The only idea I had so far was, changing the way __set_charset_from_locale
> > works from within _setlocale_r:
> > 
> > We could add a Cygwin-specific function only fetching the codepage and
> > call it unconditionally from _setlocale_r.  __set_charset_from_locale is
> > called with a new parameter "codepage", so it doesn't have to fetch the
> > CP by itself, but it's still only called from _setlocale_r if necessary.
> > 
> > Would that be sufficient?  The CP conversion from 20127/ASCII to 65001/UTF8
> > could be done at the point the codepage is actually required.
> 
> I think I have found the answer to your request.
> Patch attached. What do you think of this patch?
> 
> Calling initial_setlocale() is necessary because
> nl_langinfo() always returns "ANSI_X3.4-1968"
> regardless locale setting if this is not called.

Well, this is correct.  Per POSIX, a standard-conformant application is
running in the "C" locale unless it calls setlocale() explicitely.
That's one reason Cygwin defaults to UTF-8 internally.  Everything,
including the terminal, is supposed to default to UTF-8.  That's the
most sane default, even if an application is not locale-aware.

So, to follow POSIX, initial_setlocale() is used to set up the
environment and command line stuff and then, before calling the
application's main, Cygwin calls _setlocale_r (_REENT, LC_CTYPE, "C");
to reset the apps default locale to "C".  That's why nl_langinfo()
returns "ANSI_X3.4-1968".

However, the initial_setlocale() call in dll_crt0_1 calls
internal_setlocale(), and *that* function sets the conversion functions
for the internal conversions.  What it *doesn't* do yet at the moment is
to store the charset name itself or, better, the equivalent codepage.

If we change that, setup_locale can simply go away.  Below is a patch
doing just that.  Can you please check if that works in your test
scenarios?

However, there's something which worries me.  Why do we need or set the
pseudo terminal codepage at all?  I see that you convert from MB charset
to MB charset and then use the result in WriteFile to the connecting
pipes.  Question is this: Why not just converting the strings via
sys_mbstowcs to a UTF-16 string and then send that over the line, using
WriteConsoleW for the final output to the console?  That would simplify
this stuff quite a bit, wouldn't it?  After all, for writing UTF-16 to
the console, we simply don't need to know or care for the console CP.


Thanks,
Corinna
>From d5dff1690d1a228579eef472441d67fb6ef20b5e Mon Sep 17 00:00:00 2001
From: Corinna Vinschen 
Date: Fri, 4 Sep 2020 14:39:58 +0200
Subject: [PATCH] Cygwin: fetch codepage for pseudo console at
 initial_setlocale time

drop locale checking in fhandler_tty code. Create function
__eval_codepage_from_internal_charset called from
internal_setlocale to store term_code_page in cygheap,
rather than in tty.  Drop now unneeded setup_locale calls
during fork and execve.

Signed-off-by: Corinna Vinschen 
---
 winsup/cygwin/cygheap.h   |   1 +
 winsup/cygwin/fhandler.h  |   1 -
 winsup/cygwin/fhandler_tty.cc | 195 ++
 winsup/cygwin/nlsfuncs.cc |  54 +-
 winsup/cygwin/spawn.cc|  12 ---
 5 files changed, 60 insertions(+), 203 deletions(-)

diff --git a/winsup/cygwin/cygheap.h b/winsup/cygwin/cygheap.h
index 8877cc358c39..2b84f4252071 100644
--- a/winsup/cygwin/cygheap.h
+++ b/winsup/cygwin/cygheap.h
@@ -341,6 +341,7 @@ struct cygheap_debug
 struct cygheap_locale
 {
   mbtowc_p mbtowc;
+  UINT term_code_page;
 };
 
 struct user_heap_info
diff --git a/winsup/cygwin/fhandler.h b/winsup/cygwin/fhandler.h
index b4ba9428aa90..af619df5f9b1 100644
--- a/winsup/cygwin/fhandler.h
+++ b/winsup/cygwin/fhandler.h
@@ -2336,7 +2336,6 @@ class fhandler_pty_slave: public fhandler_pty_common
   void set_switch_to_pcon (void);
   void reset_switch_to_pcon (void);
   void mask_switch_to_pcon_in (bool mask);
-  void setup_locale (void);
 };
 
 #define __ptsname(buf, unit) __small_sprintf ((buf), "/dev/pty%d", (unit))
diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc
index 8bf39c3e6cf2..f207a0b27892 100644
--- a/winsup/cygwin/fhandler_tty.cc
+++ b/winsup/cygwin/fhandler_tty.cc
@@ -1614,8 +1614,9 @@ fhandler_pty_master::write (const void *ptr, size_t len)
   if (to_be_read_from_pcon () && get_ttyp ()->h_pseudo_console)
 {
   size_t nlen;
-  char *buf = convert_mb_str
-   (CP_UTF8, &nlen, get_ttyp ()->term_code_page, (const char *) ptr, len);
+  char *buf = convert_mb_str (CP_UTF8, &nlen,
+ cygheap->locale.term_code_page,
+ (const char *) ptr, len);
 
   WaitForSingleObject (input_mutex, INFINITE);
 
@@ -1782,185 +1783,6 @@ fhandler_pty_common::set_close_on_exec (bool val)
   close_on_exec (val);
 }
 
-/

Re: [PATCH] Cygwin: create install dir for libs

2020-09-04 Thread Corinna Vinschen
On Aug 27 09:02, David McFarland via Cygwin-patches wrote:
> This fixes a race in parallel installs.
> ---
>  winsup/cygwin/Makefile.in | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/winsup/cygwin/Makefile.in b/winsup/cygwin/Makefile.in
> index fac81759e..ea0243033 100644
> --- a/winsup/cygwin/Makefile.in
> +++ b/winsup/cygwin/Makefile.in
> @@ -600,7 +600,7 @@ install: install-libs install-headers install-man 
> install-ldif install_target \
>  uninstall: uninstall-libs uninstall-headers uninstall-man
>  
>  install-libs: $(TARGET_LIBS)
> - @$(MKDIRP) $(DESTDIR)$(bindir)
> + @$(MKDIRP) $(DESTDIR)$(bindir) $(DESTDIR)$(tooldir)/lib
>   $(INSTALL_PROGRAM) $(TEST_DLL_NAME) $(DESTDIR)$(bindir)/$(DLL_NAME); \
>   for i in $^; do \
>   $(INSTALL_DATA) $$i $(DESTDIR)$(tooldir)/lib/`basename $$i` ; \
> -- 
> 2.28.0

Pushed.

Thanks,
Corinna


Re: [PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"

2020-09-04 Thread Brian Inglis
On 2020-09-04 06:44, Corinna Vinschen wrote:
> Hi Takashi,
> 
> On Sep  4 18:21, Takashi Yano via Cygwin-patches wrote:
>> Hi Corinna,
>>
>> On Thu, 3 Sep 2020 19:59:12 +0200
>> Corinna Vinschen wrote:
>>> The only idea I had so far was, changing the way __set_charset_from_locale
>>> works from within _setlocale_r:
>>>
>>> We could add a Cygwin-specific function only fetching the codepage and
>>> call it unconditionally from _setlocale_r.  __set_charset_from_locale is
>>> called with a new parameter "codepage", so it doesn't have to fetch the
>>> CP by itself, but it's still only called from _setlocale_r if necessary.
>>>
>>> Would that be sufficient?  The CP conversion from 20127/ASCII to 65001/UTF8
>>> could be done at the point the codepage is actually required.
>>
>> I think I have found the answer to your request.
>> Patch attached. What do you think of this patch?
>>
>> Calling initial_setlocale() is necessary because
>> nl_langinfo() always returns "ANSI_X3.4-1968"
>> regardless locale setting if this is not called.
> 
> Well, this is correct.  Per POSIX, a standard-conformant application is
> running in the "C" locale unless it calls setlocale() explicitely.
> That's one reason Cygwin defaults to UTF-8 internally.  Everything,
> including the terminal, is supposed to default to UTF-8.  That's the
> most sane default, even if an application is not locale-aware.
> 
> So, to follow POSIX, initial_setlocale() is used to set up the
> environment and command line stuff and then, before calling the
> application's main, Cygwin calls _setlocale_r (_REENT, LC_CTYPE, "C");
> to reset the apps default locale to "C".  That's why nl_langinfo()
> returns "ANSI_X3.4-1968".
> 
> However, the initial_setlocale() call in dll_crt0_1 calls
> internal_setlocale(), and *that* function sets the conversion functions
> for the internal conversions.  What it *doesn't* do yet at the moment is
> to store the charset name itself or, better, the equivalent codepage.
> 
> If we change that, setup_locale can simply go away.  Below is a patch
> doing just that.  Can you please check if that works in your test
> scenarios?
> 
> However, there's something which worries me.  Why do we need or set the
> pseudo terminal codepage at all?  I see that you convert from MB charset
> to MB charset and then use the result in WriteFile to the connecting
> pipes.  Question is this: Why not just converting the strings via
> sys_mbstowcs to a UTF-16 string and then send that over the line, using
> WriteConsoleW for the final output to the console?  That would simplify
> this stuff quite a bit, wouldn't it?  After all, for writing UTF-16 to
> the console, we simply don't need to know or care for the console CP.

IIRC his locale was ja_JP.UTF-8 but he got English messages on CP 932!

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.
[Data in IEC units and prefixes, physical quantities in SI.]


Re: [PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"

2020-09-04 Thread Johannes Schindelin
Hi Takashi,

On Fri, 4 Sep 2020, Takashi Yano via Cygwin-patches wrote:

> Hi Johannes and Corinna,
>
> On Tue, 1 Sep 2020 18:19:16 +0200 (CEST)
> Johannes Schindelin wrote:
>
> > When `LANG=en_US.UTF-8`, the detected `LCID` is 0x0409, which is
> > correct, but after that (at least if Pseudo Console support is enabled),
> > we try to find the default code page for that `LCID`, which is ASCII
> > (437). Subsequently, we set the Console output code page to that value,
> > completely ignoring that we wanted to use UTF-8.
> >
> > Let's not ignore the specifically asked-for UTF-8 character set.
> >
> > While at it, let's also set the Console output code page even if Pseudo
> > Console support is disabled; contrary to the behavior of v3.0.7, the
> > Console output code page is not ignored in that case.
> >
> > The most common symptom would be that console applications which do not
> > specifically call `SetConsoleOutputCP()` but output UTF-8-encoded text
> > seem to be broken with v3.1.x when they worked plenty fine with v3.0.x.
> >
> > This fixes https://github.com/msys2/MSYS2-packages/issues/1974,
> > https://github.com/msys2/MSYS2-packages/issues/2012,
> > https://github.com/rust-lang/cargo/issues/8369,
> > https://github.com/git-for-windows/git/issues/2734,
> > https://github.com/git-for-windows/git/issues/2793,
> > https://github.com/git-for-windows/git/issues/2792, and possibly quite a
> > few others.
> >
> > Signed-off-by: Johannes Schindelin 
> > ---
> >  winsup/cygwin/fhandler_tty.cc | 9 +
> >  1 file changed, 9 insertions(+)
> >
> > diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc
> > index 06789a500..414c26992 100644
> > --- a/winsup/cygwin/fhandler_tty.cc
> > +++ b/winsup/cygwin/fhandler_tty.cc
> > @@ -2859,6 +2859,15 @@ fhandler_pty_slave::setup_locale (void)
> >char charset[ENCODING_LEN + 1] = "ASCII";
> >LCID lcid = get_langinfo (locale, charset);
> >
> > +  /* Special-case the UTF-8 character set */
> > +  if (strcasecmp (charset, "UTF-8") == 0)
> > +{
> > +  get_ttyp ()->term_code_page = CP_UTF8;
> > +  SetConsoleCP (CP_UTF8);
> > +  SetConsoleOutputCP (CP_UTF8);
> > +  return;
> > +}
> > +
> >/* Set console code page from locale */
> >if (get_pseudo_console ())
> >  {
> > --
> > 2.27.0
>
> I would like to propose a counter patch attached.
> What do you think of this patch?
>
> This patch does not treat UTF-8 as special.

Sure, but it only fixes the issue in `disable_pcon` mode in the current
tip commit. That's because a new Pseudo Console is created for every
spawned non-Cygwin console application, and that new Pseudo Console does
_not_ have the code page set by your patch.

I verified that this patch works around the issue in `disable_pcon` mode,
which is good.

Ciao,
Dscho


Re: [PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"

2020-09-04 Thread Takashi Yano via Cygwin-patches
Hi Corinna,

On Fri, 4 Sep 2020 14:44:00 +0200
Corinna Vinschen wrote:
> Hi Takashi,
> 
> On Sep  4 18:21, Takashi Yano via Cygwin-patches wrote:
> > Hi Corinna,
> > 
> > On Thu, 3 Sep 2020 19:59:12 +0200
> > Corinna Vinschen wrote:
> > > The only idea I had so far was, changing the way __set_charset_from_locale
> > > works from within _setlocale_r:
> > > 
> > > We could add a Cygwin-specific function only fetching the codepage and
> > > call it unconditionally from _setlocale_r.  __set_charset_from_locale is
> > > called with a new parameter "codepage", so it doesn't have to fetch the
> > > CP by itself, but it's still only called from _setlocale_r if necessary.
> > > 
> > > Would that be sufficient?  The CP conversion from 20127/ASCII to 
> > > 65001/UTF8
> > > could be done at the point the codepage is actually required.
> > 
> > I think I have found the answer to your request.
> > Patch attached. What do you think of this patch?
> > 
> > Calling initial_setlocale() is necessary because
> > nl_langinfo() always returns "ANSI_X3.4-1968"
> > regardless locale setting if this is not called.
> 
> Well, this is correct.  Per POSIX, a standard-conformant application is
> running in the "C" locale unless it calls setlocale() explicitely.
> That's one reason Cygwin defaults to UTF-8 internally.  Everything,
> including the terminal, is supposed to default to UTF-8.  That's the
> most sane default, even if an application is not locale-aware.
> 
> So, to follow POSIX, initial_setlocale() is used to set up the
> environment and command line stuff and then, before calling the
> application's main, Cygwin calls _setlocale_r (_REENT, LC_CTYPE, "C");
> to reset the apps default locale to "C".  That's why nl_langinfo()
> returns "ANSI_X3.4-1968".
> 
> However, the initial_setlocale() call in dll_crt0_1 calls
> internal_setlocale(), and *that* function sets the conversion functions
> for the internal conversions.  What it *doesn't* do yet at the moment is
> to store the charset name itself or, better, the equivalent codepage.
> 
> If we change that, setup_locale can simply go away.  Below is a patch
> doing just that.  Can you please check if that works in your test
> scenarios?

I tried your patch, but unfortunately it does not work.
cygheap->locale.term_code_page is 0 in pty master.

If the following lines are moved in internal_setlocale(),

  const char *charset = __locale_charset (__get_global_locale ());
  debug_printf ("Global charset set to %s", charset);
  /* Store codepage to be utilized by pseudo console code. */
  cygheap->locale.term_code_page =
__eval_codepage_from_internal_charset (charset);

in internal_setlocale() before

  /* Don't do anything if the charset hasn't actually changed. */
  if (cygheap->locale.mbtowc == __get_global_locale ()->mbtowc)
return;

cygheap->locale.term_code_page is always 65001 even if mintty is
startted by
mintty -o locale=ja_JP -o charset=CP932
or
mintty -o locale=ja_JP -o charset=EUCJP

Perhaps, this is because LANG is not set properly yet when mintty
is started.

> However, there's something which worries me.  Why do we need or set the
> pseudo terminal codepage at all?  I see that you convert from MB charset
> to MB charset and then use the result in WriteFile to the connecting
> pipes.  Question is this: Why not just converting the strings via
> sys_mbstowcs to a UTF-16 string and then send that over the line, using
> WriteConsoleW for the final output to the console?  That would simplify
> this stuff quite a bit, wouldn't it?  After all, for writing UTF-16 to
> the console, we simply don't need to know or care for the console CP.

WriteConsoleW() cannot be used because the handle to_master_cyg is
not a console handle.

-- 
Takashi Yano 


Re: [PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"

2020-09-04 Thread Takashi Yano via Cygwin-patches
Hi Johannes,

On Fri, 4 Sep 2020 08:23:42 +0200 (CEST)
Johannes Schindelin wrote:
> Hi Takashi,
> 
> On Fri, 4 Sep 2020, Takashi Yano via Cygwin-patches wrote:
> 
> > Hi Johannes and Corinna,
> >
> > On Tue, 1 Sep 2020 18:19:16 +0200 (CEST)
> > Johannes Schindelin wrote:
> >
> > > When `LANG=en_US.UTF-8`, the detected `LCID` is 0x0409, which is
> > > correct, but after that (at least if Pseudo Console support is enabled),
> > > we try to find the default code page for that `LCID`, which is ASCII
> > > (437). Subsequently, we set the Console output code page to that value,
> > > completely ignoring that we wanted to use UTF-8.
> > >
> > > Let's not ignore the specifically asked-for UTF-8 character set.
> > >
> > > While at it, let's also set the Console output code page even if Pseudo
> > > Console support is disabled; contrary to the behavior of v3.0.7, the
> > > Console output code page is not ignored in that case.
> > >
> > > The most common symptom would be that console applications which do not
> > > specifically call `SetConsoleOutputCP()` but output UTF-8-encoded text
> > > seem to be broken with v3.1.x when they worked plenty fine with v3.0.x.
> > >
> > > This fixes https://github.com/msys2/MSYS2-packages/issues/1974,
> > > https://github.com/msys2/MSYS2-packages/issues/2012,
> > > https://github.com/rust-lang/cargo/issues/8369,
> > > https://github.com/git-for-windows/git/issues/2734,
> > > https://github.com/git-for-windows/git/issues/2793,
> > > https://github.com/git-for-windows/git/issues/2792, and possibly quite a
> > > few others.
> > >
> > > Signed-off-by: Johannes Schindelin 
> > > ---
> > >  winsup/cygwin/fhandler_tty.cc | 9 +
> > >  1 file changed, 9 insertions(+)
> > >
> > > diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc
> > > index 06789a500..414c26992 100644
> > > --- a/winsup/cygwin/fhandler_tty.cc
> > > +++ b/winsup/cygwin/fhandler_tty.cc
> > > @@ -2859,6 +2859,15 @@ fhandler_pty_slave::setup_locale (void)
> > >char charset[ENCODING_LEN + 1] = "ASCII";
> > >LCID lcid = get_langinfo (locale, charset);
> > >
> > > +  /* Special-case the UTF-8 character set */
> > > +  if (strcasecmp (charset, "UTF-8") == 0)
> > > +{
> > > +  get_ttyp ()->term_code_page = CP_UTF8;
> > > +  SetConsoleCP (CP_UTF8);
> > > +  SetConsoleOutputCP (CP_UTF8);
> > > +  return;
> > > +}
> > > +
> > >/* Set console code page from locale */
> > >if (get_pseudo_console ())
> > >  {
> > > --
> > > 2.27.0
> >
> > I would like to propose a counter patch attached.
> > What do you think of this patch?
> >
> > This patch does not treat UTF-8 as special.
> 
> Sure, but it only fixes the issue in `disable_pcon` mode in the current
> tip commit. That's because a new Pseudo Console is created for every
> spawned non-Cygwin console application, and that new Pseudo Console does
> _not_ have the code page set by your patch.

You are right. However, if pseudo console is enabled, the app
which works correclty in command prompt should work as well in
pseudo console. Therefore, there is nothing to be fixed.

> I verified that this patch works around the issue in `disable_pcon` mode,
> which is good.

Thanks for testing.

-- 
Takashi Yano 


Re: [PATCH 3/3] fhandler_pty_slave::setup_locale: respect charset == "UTF-8"

2020-09-04 Thread Corinna Vinschen
Hi Takashi,

On Sep  4 23:50, Takashi Yano via Cygwin-patches wrote:
> Hi Corinna,
> 
> On Fri, 4 Sep 2020 14:44:00 +0200
> Corinna Vinschen wrote:
> > On Sep  4 18:21, Takashi Yano via Cygwin-patches wrote:
> > > I think I have found the answer to your request.
> > > Patch attached. What do you think of this patch?
> > > 
> > > Calling initial_setlocale() is necessary because
> > > nl_langinfo() always returns "ANSI_X3.4-1968"
> > > regardless locale setting if this is not called.
> > [...]
> > However, the initial_setlocale() call in dll_crt0_1 calls
> > internal_setlocale(), and *that* function sets the conversion functions
> > for the internal conversions.  What it *doesn't* do yet at the moment is
> > to store the charset name itself or, better, the equivalent codepage.
> > 
> > If we change that, setup_locale can simply go away.  Below is a patch
> > doing just that.  Can you please check if that works in your test
> > scenarios?
> 
> I tried your patch, but unfortunately it does not work.
> cygheap->locale.term_code_page is 0 in pty master.
> 
> If the following lines are moved in internal_setlocale(),
> 
>   const char *charset = __locale_charset (__get_global_locale ());
>   debug_printf ("Global charset set to %s", charset);
>   /* Store codepage to be utilized by pseudo console code. */
>   cygheap->locale.term_code_page =
> __eval_codepage_from_internal_charset (charset);
> 
> in internal_setlocale() before
> 
>   /* Don't do anything if the charset hasn't actually changed. */
>   if (cygheap->locale.mbtowc == __get_global_locale ()->mbtowc)
> return;

Uh, that makes sense.

> cygheap->locale.term_code_page is always 65001 even if mintty is
> startted by
> mintty -o locale=ja_JP -o charset=CP932
> or
> mintty -o locale=ja_JP -o charset=EUCJP
> 
> Perhaps, this is because LANG is not set properly yet when mintty
> is started.

Yeah, that's the reason.  The above settings of locale and charset on
the CLI should only take over when mintty calls setlocale() with a
matching string.  The fact that it sets the matching value in the
environment, too, should only affect child processes, not mintty itself.

But it's incorrect to call initial_setlocale() from setup_locale()
without resetting it to its original value.

Unfortunately that doesn't solve any problem with the pseudo console
codepage.  Drat.  It sounds like you need the terminal's charset,
rather than the one set in the environment.

So this boils down to the fact that term_code_page must be set
after the application is already running and as soo as it creates
the pty, me thinks.  What if __eval_codepage_from_internal_charset()
is called at pty creation?  Or even on reading from /writing to
the pty the first time?  That should always be late enough to fetch
the correct codepage.

Patch attached.  Does that work as expected?

> > However, there's something which worries me.  Why do we need or set the
> > pseudo terminal codepage at all?  I see that you convert from MB charset
> > to MB charset and then use the result in WriteFile to the connecting
> > pipes.  Question is this: Why not just converting the strings via
> > sys_mbstowcs to a UTF-16 string and then send that over the line, using
> > WriteConsoleW for the final output to the console?  That would simplify
> > this stuff quite a bit, wouldn't it?  After all, for writing UTF-16 to
> > the console, we simply don't need to know or care for the console CP.
> 
> WriteConsoleW() cannot be used because the handle to_master_cyg is
> not a console handle.

Sigh, of course.  It's all just pipes.  I forgot that, sorry.

Btw., the main loop in fhandler_pty_master::pty_master_fwd_thread()
calls 

  char *buf = convert_mb_str (cygheap->locale.term_code_page,
  &nlen, CP_UTF8, ptr, wlen);
 ^^^
  [...]
  WriteFile (to_master_cyg, ...

But then, after the code breaks from that loop, it calls

  char *buf = convert_mb_str (cygheap->locale.term_code_page, &nlen,
  GetConsoleOutputCP (), ptr, wlen);
  ^
  [...]
  process_opost_output (to_master_cyg, ...

process_opost_output then calls WriteFile on that to_master_cyg handle,
just like the WriteFile call above.

Is that really correct?  Shouldn't the second invocation use CP_UTF8 as
well?


Corinna
>From 89acd62e88871a89b9bcb2964924f8960c7673ec Mon Sep 17 00:00:00 2001
From: Corinna Vinschen 
Date: Fri, 4 Sep 2020 21:20:35 +0200
Subject: [PATCH] Cygwin: try calling __eval_codepage_from_internal_charset in
 pty code

the idea being, that we need the tty's locale and charset, not the
environment locale settings when starting the tty.
---
 winsup/cygwin/fhandler_tty.cc | 17 +
 winsup/cygwin/nlsfuncs.cc | 12 +---
 2 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc
index f207a0b27892..feb014175123 100644

[PATCH v4 2/3] regparm: make code highlight happy

2020-09-04 Thread Mingye Wang
---
 winsup/cygwin/regparm.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/winsup/cygwin/regparm.h b/winsup/cygwin/regparm.h
index cce1bab4a..a14c501db 100644
--- a/winsup/cygwin/regparm.h
+++ b/winsup/cygwin/regparm.h
@@ -8,7 +8,7 @@ details. */
 
 #pragma once
 
-#if defined (__x86_64__) || defined (__CYGMAGIC__)
+#if defined (__x86_64__) || defined (__CYGMAGIC__) || !defined (__GNUC__)
 # define __reg1
 # define __reg2
 # define __reg3
-- 
2.20.1.windows.1


[PATCH v4 1/3] Cygwin: rewrite and make public cmdline parser

2020-09-04 Thread Mingye Wang
This commit rewrites the cmdline parser to achieve the following:
* MSVCRT compatibility. Except for the single-quote handling (an
  extension for compatibility with old Cygwin), the parser now
  interprets option boundaries exactly like MSVCR since 2008. This fixes
  the issue where our escaping does not work with our own parsing.
* Clarity. Since globify() is no longer responsible for handling the
  opening and closing of quotes, the code is much simpler.
* Sanity. The GLOB_NOCHECK flag is removed, so a failed glob correctly
  returns the literal value. Without the change, anything path-like
  would be garbled by globify's escaping.
* A memory leak in the @file expansion is removed by rewriting it to use
  a stack of buffers. This also simplifies the code since we no longer
  have to move stuff. The "downside" is that tokens can no longer cross
  file boundaries.

Some clarifications are made in the documentation for when globs are not
expanded.  The functions are made public for testing, but my tcl setup
is currently too messed up for running them!  I did test them as an
isolated program on WSL-Debian.

The change fixes two complaints of mine:
* That cygwin is incompatible with its own escape.[1]
* That there is no way to echo `C:\"` from win32.[2]
  [1]: https://cygwin.com/pipermail/cygwin/2020-June/245162.html
  [2]: https://cygwin.com/pipermail/cygwin/2019-October/242790.html

(It's never the point to spawn cygwin32 from cygwin64. Consistency
matters: with yourself always, and with the outside world when you are
supposed to.)

This is the fourth version of the patch. I provide all my patches to
Cygwin, including this one and all future ones, under the 2-clause BSD
license.
---
 winsup/cygwin/common.din  |   2 +
 winsup/cygwin/dcrt0.cc| 297 +---
 winsup/cygwin/include/sys/cygwin.h|   2 +
 winsup/cygwin/winf.cc | 378 +-
 winsup/cygwin/winf.h  |   8 +-
 winsup/cygwin/winsup.h|   7 +-
 winsup/doc/cygwinenv.xml  |   8 +-
 winsup/doc/faq-api.xml|   2 +-
 winsup/doc/misc-funcs.xml |  71 +
 winsup/testsuite/Makefile.in  |   6 +-
 winsup/testsuite/winsup.api/cmdline.c | 102 +++
 11 files changed, 577 insertions(+), 306 deletions(-)
 create mode 100644 winsup/testsuite/winsup.api/cmdline.c

diff --git a/winsup/cygwin/common.din b/winsup/cygwin/common.din
index a7b4aa2b0..f15dc6a9e 100644
--- a/winsup/cygwin/common.din
+++ b/winsup/cygwin/common.din
@@ -393,6 +393,8 @@ cygwin_split_path NOSIGFE
 cygwin_stackdump SIGFE
 cygwin_umount SIGFE
 cygwin_winpid_to_pid SIGFE
+cygwin_cmdline_parse SIGFE
+cygwin_cmdline_build SIGFE
 daemon SIGFE
 dbm_clearerr SIGFE
 dbm_close SIGFE
diff --git a/winsup/cygwin/dcrt0.cc b/winsup/cygwin/dcrt0.cc
index 5d8b4b74e..6f906102d 100644
--- a/winsup/cygwin/dcrt0.cc
+++ b/winsup/cygwin/dcrt0.cc
@@ -10,7 +10,6 @@ details. */
 #include "miscfuncs.h"
 #include 
 #include 
-#include "glob.h"
 #include 
 #include 
 #include 
@@ -35,6 +34,7 @@ details. */
 #include "cygxdr.h"
 #include "fenv.h"
 #include "ntdll.h"
+#include "winf.h"
 
 #define MAX_AT_FILE_LEVEL 10
 
@@ -77,292 +77,6 @@ do_global_ctors (void (**in_pfunc)(), int force)
 (*pfunc) ();
 }
 
-/*
- * Replaces @file in the command line with the contents of the file.
- * There may be multiple @file's in a single command line
- * A \@file is replaced with @file so that echo \@foo would print
- * @foo and not the contents of foo.
- */
-static bool __stdcall
-insert_file (char *name, char *&cmd)
-{
-  HANDLE f;
-  DWORD size;
-  tmp_pathbuf tp;
-
-  PWCHAR wname = tp.w_get ();
-  sys_mbstowcs (wname, NT_MAX_PATH, name + 1);
-  f = CreateFileW (wname,
-  GENERIC_READ,/* open for reading */
-  FILE_SHARE_VALID_FLAGS,  /* share for reading*/
-  &sec_none_nih,   /* default security */
-  OPEN_EXISTING,   /* existing file only   */
-  FILE_ATTRIBUTE_NORMAL,   /* normal file  */
-  NULL);   /* no attr. template*/
-
-  if (f == INVALID_HANDLE_VALUE)
-{
-  debug_printf ("couldn't open file '%s', %E", name);
-  return false;
-}
-
-  /* This only supports files up to about 4 billion bytes in
- size.  I am making the bold assumption that this is big
- enough for this feature */
-  size = GetFileSize (f, NULL);
-  if (size == 0x)
-{
-  CloseHandle (f);
-  debug_printf ("couldn't get file size for '%s', %E", name);
-  return false;
-}
-
-  int new_size = strlen (cmd) + size + 2;
-  char *tmp = (char *) malloc (new_size);
-  if (!tmp)
-{
-  CloseHandle (f);
-  debug_printf ("malloc failed, %E");
-  return false;
-}
-
-  /* realloc passed as it should */
-  DWORD rf_read;
-  BOOL rf_result;
-  rf_result = ReadFile (f, 

[PATCH v4 3/3] testsuite: don't strip dir from obj files

2020-09-04 Thread Mingye Wang
Make has no idea how to build an o file when the correspoinding c file
is not there. That happens when we strip the dir.
---
 winsup/testsuite/Makefile.in | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/winsup/testsuite/Makefile.in b/winsup/testsuite/Makefile.in
index bdc116d12..ad3656be8 100644
--- a/winsup/testsuite/Makefile.in
+++ b/winsup/testsuite/Makefile.in
@@ -75,7 +75,7 @@ endif
 RUNTIME=$(cygwin_build)/cygwin0.dll $(cygwin_build)/libcygwin0.a
 
 TESTSUP_LIB_NAME:=libltp.a
-TESTSUP_OFILES:=${sort ${addsuffix .o,${basename ${notdir ${wildcard 
$(libltp_srcdir)/lib/*.c}
+TESTSUP_OFILES:=${sort ${addsuffix .o,${basename ${wildcard 
$(libltp_srcdir)/lib/*.c
 
 override ALL_CFLAGS:=${filter-out -O%,$(ALL_CFLAGS)}
 override COMPILE_CC:=${filter-out -O%,$(COMPILE_CC)}
-- 
2.20.1.windows.1