[PATCH] Cygwin: net: Make if_nametoindex, etc. consistent with if_nameindex.

2024-02-03 Thread Takashi Yano
Currently, if_nametoindex() and if_indextoname() handle interface names
such as "ethernet_32777", while if_nameindex() returns the names such
as "{5AF7ACD0-D52E-4DFC-A4D0-54D3E6D6B2AC}". This patch unifies the
interface names to the latter.

Fixes: c356901f0d69 ("Rename if_indextoname to cygwin_if_indextoname (analag 
for if_nametoindex)")
Reviewed-by:
Signed-off-by: Takashi Yano 
---
 winsup/cygwin/autoload.cc |  2 --
 winsup/cygwin/net.cc  | 31 +--
 2 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/winsup/cygwin/autoload.cc b/winsup/cygwin/autoload.cc
index c1a124c1d..7e610bdd0 100644
--- a/winsup/cygwin/autoload.cc
+++ b/winsup/cygwin/autoload.cc
@@ -462,8 +462,6 @@ LoadDLLfunc (GetNetworkParams, iphlpapi)
 LoadDLLfunc (GetTcpTable, iphlpapi)
 LoadDLLfunc (GetTcp6Table, iphlpapi)
 LoadDLLfunc (GetUdpTable, iphlpapi)
-LoadDLLfunc (if_indextoname, iphlpapi)
-LoadDLLfunc (if_nametoindex, iphlpapi)
 
 LoadDLLfuncEx2 (DiscardVirtualMemory, kernel32, 1, 127)
 LoadDLLfuncEx (ClosePseudoConsole, kernel32, 1)
diff --git a/winsup/cygwin/net.cc b/winsup/cygwin/net.cc
index 8840d5ead..08c584fe5 100644
--- a/winsup/cygwin/net.cc
+++ b/winsup/cygwin/net.cc
@@ -2001,13 +2001,40 @@ get_ifconf (struct ifconf *ifc, int what)
 extern "C" unsigned
 cygwin_if_nametoindex (const char *name)
 {
-  return (unsigned) ::if_nametoindex (name);
+  PIP_ADAPTER_ADDRESSES pa0 = NULL, pap;
+  if (get_adapters_addresses (&pa0, AF_UNSPEC))
+for (pap = pa0; pap; pap = pap->Next)
+  if (strcmp (name, pap->AdapterName) == 0)
+   {
+ free (pa0);
+ return pap->IfIndex;
+   }
+  if (pa0)
+free (pa0);
+  return 0;
 }
 
 extern "C" char *
 cygwin_if_indextoname (unsigned ifindex, char *ifname)
 {
-  return ::if_indextoname (ifindex, ifname);
+  if (ifindex == 0 || ifname == NULL)
+{
+  set_errno (ENXIO);
+  return NULL;
+}
+  PIP_ADAPTER_ADDRESSES pa0 = NULL, pap;
+  if (get_adapters_addresses (&pa0, AF_UNSPEC))
+for (pap = pa0; pap; pap = pap->Next)
+  if (ifindex == pap->IfIndex)
+   {
+ strcpy (ifname, pap->AdapterName);
+ free (pa0);
+ return ifname;
+   }
+  if (pa0)
+free (pa0);
+  set_errno (ENXIO);
+  return NULL;
 }
 
 extern "C" struct if_nameindex *
-- 
2.43.0



Re: [PATCH] Cygwin: console: Fix exit code for non-cygwin process.

2024-02-03 Thread Johannes Schindelin
Hi Takashi,

On Fri, 2 Feb 2024, Takashi Yano wrote:

> If non-cygwin process is executed in console, the exit code is not
> set correctly. This is because the stub process for non-cygwin app
> crashes in fhandler_console::set_disable_master_thread() due to NULL
> pointer dereference. This bug was introduced by the commit:
> 3721a756b0d8 ("Cygwin: console: Make the console accessible from
> other terminals."), that the pointer cons is accessed before fixing
> when it is NULL. This patch fixes the issue.
>
> Fixes: 3721a756b0d8 ("Cygwin: console: Make the console accessible from other 
> terminals.")
> Reported-by: Johannes Schindelin 
> Signed-off-by: Takashi Yano 

Thank you for fixing this so swiftly. I still wish the logic was
drastically simpler to understand so that even mere humans like myself
would stand a chance to fix such bugs, but I am happy that it is fixed
now.

Ciao,
Johannes


Re: [PATCH] Cygwin: console: Fix exit code for non-cygwin process.

2024-02-03 Thread Johannes Schindelin
Hi Takashi,

On Sat, 3 Feb 2024, Johannes Schindelin wrote:

> On Fri, 2 Feb 2024, Takashi Yano wrote:
>
> > If non-cygwin process is executed in console, the exit code is not
> > set correctly. This is because the stub process for non-cygwin app
> > crashes in fhandler_console::set_disable_master_thread() due to NULL
> > pointer dereference. This bug was introduced by the commit:
> > 3721a756b0d8 ("Cygwin: console: Make the console accessible from
> > other terminals."), that the pointer cons is accessed before fixing
> > when it is NULL. This patch fixes the issue.
> >
> > Fixes: 3721a756b0d8 ("Cygwin: console: Make the console accessible from 
> > other terminals.")
> > Reported-by: Johannes Schindelin 
> > Signed-off-by: Takashi Yano 
>
> Thank you for fixing this so swiftly. I still wish the logic was
> drastically simpler to understand so that even mere humans like myself
> would stand a chance to fix such bugs, but I am happy that it is fixed
> now.

On IRC, you reported that the thread would crash if `cons` was not fixed
up. The symptom was that that crash would apparently prevent the exit code
from being read, and it would be left at 0, indicating potentially
incorrectly that the non-Cygwin process succeeded.

I wonder: What would it take to change this logic so that the crash would
be detected (and not be misinterpreted as exit code 0)?

Ciao,
Johannes


Re: [PATCH] Cygwin: console: Avoid slipping past disable_master_thread check.

2024-02-03 Thread Johannes Schindelin
Hi Takashi,

On Sat, 3 Feb 2024, Takashi Yano wrote:

> If disable_master_thread flag is set between the code checking that
> flag not be set and the code acquiring input_mutex, input record is
> processed once after setting disable_master_thread flag. This patch
> prevents that.
>
> Fixes: d4aacd50e6cf ("Cygwin: console: Add missing input_mutex guard.")
> Signed-off-by: Takashi Yano 

Thank you for double-checking this (and for finding and fixing the bug).

I wonder what could be a symptom of this bug. I ask because we have
noticed a couple of inexplicable hangs in GitHub workflow runs in the Git
for Windows and the MSYS2 projects, hangs that are almost certainly due to
the ConPTY code in Cygwin, and I hope to figure out what causes them, and
even more importantly, how to fix those. Maybe the bug you fixed in this
patch could explain this?

Concretely, the hangs occur typically when some `pacman` process (a
package manager using the MSYS2 runtime, i.e. the Cygwin runtime with
several dozen patches on top) calls a few non-Cygwin processes. Those
processes seem to succeed, but there is an extra `pacman` process left
hanging around (reported using the same command-line as its parent
process, indicating a `fork()`ed child process or maybe that
signal-handler that is spawned for every non-Cygwin child process) and at
that point the program hangs indefinitely (or at least until the GitHub
workflow run times out after 6 hours).

I was not able to obtain any helpful stacktraces, they all seem to make no
sense, I only vaguely remember that one thread was waiting for an object,
but that could be a false flag.

Stopping those hanging `pacman` processes via `wmic process ... delete`
counter-intuitively fails to result in `pacman` to exit with a non-zero
exit code. Instead, the program now runs to completion successfully!

Most recently, these hangs became almost reliably reproducible, when we
started changing a certain GitHub workflow that runs on a Windows/ARM64
build agent. I suspect that Windows/ARM64 just makes this much more
likely, but that the bug is also there on regular Windows/x86_64.

What changed in the GitHub workflow is a new PowerShell script that runs
`pacman` a couple of times, trying to update packages, and after that
force-reinstalls a certain package for the benefit of its post-install
script. This post-install script is run using the (MSYS) Bash, and it
calls among other things (non-MSYS) `git.exe`. And that's where it hangs
almost every time.

When I log into the build agent via RDP, I do not see any `git.exe`
process running, but the same type of hanging `pacman` process as
indicated above. Using the `wmic` command to stop the hanging process lets
the GitHub workflow continue without any indication of failure.

Running the PowerShell script manually succeeds every single time, so I
think the hang might be connected to running things headlessly.

Do you have any idea what the bug could be? Or how I could diagnose this
better? Attaching via `gdb` only produces unhelpful stacktraces (that may
even be bogus, by the looks of it). Or do you think that your patch that I
am replying to could potentially fix this problem? How could the code be
improved to avoid those hangs altogether, or at least to make them easier
to diagnose?

Ciao,
Johannes

> ---
>  winsup/cygwin/fhandler/console.cc | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/winsup/cygwin/fhandler/console.cc 
> b/winsup/cygwin/fhandler/console.cc
> index 6a42b4949..1c8d383cd 100644
> --- a/winsup/cygwin/fhandler/console.cc
> +++ b/winsup/cygwin/fhandler/console.cc
> @@ -420,6 +420,12 @@ fhandler_console::cons_master_thread (handle_set_t *p, 
> tty *ttyp)
>   }
>
>WaitForSingleObject (p->input_mutex, mutex_timeout);
> +  /* Ensure accessing input recored is not disabled. */
> +  if (con.disable_master_thread)
> + {
> +   ReleaseMutex (p->input_mutex);
> +   continue;
> + }
>total_read = 0;
>switch (cygwait (p->input_handle, (DWORD) 0))
>   {
> @@ -4545,8 +4551,6 @@ fhandler_console::set_disable_master_thread (bool x, 
> fhandler_console *cons)
>   return;
>  }
>const _minor_t unit = cons->get_minor ();
> -  if (con.disable_master_thread == x)
> -return;
>cons->acquire_input_mutex (mutex_timeout);
>con.disable_master_thread = x;
>cons->release_input_mutex ();
> --
> 2.43.0
>
>


Re: [PATCH] Cygwin: console: Fix exit code for non-cygwin process.

2024-02-03 Thread Takashi Yano
On Sat, 3 Feb 2024 15:27:06 +0100 (CET)
Johannes Schindelin wrote:
> On IRC, you reported that the thread would crash if `cons` was not fixed
> up. The symptom was that that crash would apparently prevent the exit code
> from being read, and it would be left at 0, indicating potentially
> incorrectly that the non-Cygwin process succeeded.
> 
> I wonder: What would it take to change this logic so that the crash would
> be detected (and not be misinterpreted as exit code 0)?

I am not sure, but I think it is necessary to modify:
pinfo::exit()
pinfo::meybe_set_exit_code_from_windows()
pinfo::set_exit_code()

I guess detecting crash of sbub process needs modification of
spawn.cc.

-- 
Takashi Yano 


Re: [PATCH] Cygwin: console: Fix exit code for non-cygwin process.

2024-02-03 Thread Corinna Vinschen
On Feb  4 00:04, Takashi Yano wrote:
> On Sat, 3 Feb 2024 15:27:06 +0100 (CET)
> Johannes Schindelin wrote:
> > On IRC, you reported that the thread would crash if `cons` was not fixed
> > up. The symptom was that that crash would apparently prevent the exit code
> > from being read, and it would be left at 0, indicating potentially
> > incorrectly that the non-Cygwin process succeeded.
> > 
> > I wonder: What would it take to change this logic so that the crash would
> > be detected (and not be misinterpreted as exit code 0)?
> 
> I am not sure, but I think it is necessary to modify:
> pinfo::exit()
> pinfo::meybe_set_exit_code_from_windows()
> pinfo::set_exit_code()
> 
> I guess detecting crash of sbub process needs modification of
> spawn.cc.

Dumb question: If, as Johannes said, the error code cannot be fetched,
can't we set the error code to a POSIX return code indicating a signal?

I.e., checking for WIFSIGNALED() returns 1 and WTERMSIG() returns, say,
SIGKILL or something?


Corinna


Re: [PATCH] Cygwin: console: Avoid slipping past disable_master_thread check.

2024-02-03 Thread Takashi Yano
On Sat, 3 Feb 2024 15:53:29 +0100 (CET)
Johannes Schindelin wrote:
> I wonder what could be a symptom of this bug. I ask because we have
> noticed a couple of inexplicable hangs in GitHub workflow runs in the Git
> for Windows and the MSYS2 projects, hangs that are almost certainly due to
> the ConPTY code in Cygwin, and I hope to figure out what causes them, and
> even more importantly, how to fix those. Maybe the bug you fixed in this
> patch could explain this?

I don't think so. The symptom of this bug is that console input for
non-cygwin process can be lost or reordered at the starting timing
of that program for about 40 msec.

> Concretely, the hangs occur typically when some `pacman` process (a
> package manager using the MSYS2 runtime, i.e. the Cygwin runtime with
> several dozen patches on top) calls a few non-Cygwin processes. Those
> processes seem to succeed, but there is an extra `pacman` process left
> hanging around (reported using the same command-line as its parent
> process, indicating a `fork()`ed child process or maybe that
> signal-handler that is spawned for every non-Cygwin child process) and at
> that point the program hangs indefinitely (or at least until the GitHub
> workflow run times out after 6 hours).
> 
> I was not able to obtain any helpful stacktraces, they all seem to make no
> sense, I only vaguely remember that one thread was waiting for an object,
> but that could be a false flag.
> 
> Stopping those hanging `pacman` processes via `wmic process ... delete`
> counter-intuitively fails to result in `pacman` to exit with a non-zero
> exit code. Instead, the program now runs to completion successfully!
> 
> Most recently, these hangs became almost reliably reproducible, when we
> started changing a certain GitHub workflow that runs on a Windows/ARM64
> build agent. I suspect that Windows/ARM64 just makes this much more
> likely, but that the bug is also there on regular Windows/x86_64.
> 
> What changed in the GitHub workflow is a new PowerShell script that runs
> `pacman` a couple of times, trying to update packages, and after that
> force-reinstalls a certain package for the benefit of its post-install
> script. This post-install script is run using the (MSYS) Bash, and it
> calls among other things (non-MSYS) `git.exe`. And that's where it hangs
> almost every time.
> 
> When I log into the build agent via RDP, I do not see any `git.exe`
> process running, but the same type of hanging `pacman` process as
> indicated above. Using the `wmic` command to stop the hanging process lets
> the GitHub workflow continue without any indication of failure.
> 
> Running the PowerShell script manually succeeds every single time, so I
> think the hang might be connected to running things headlessly.
> 
> Do you have any idea what the bug could be? Or how I could diagnose this
> better? Attaching via `gdb` only produces unhelpful stacktraces (that may
> even be bogus, by the looks of it). Or do you think that your patch that I
> am replying to could potentially fix this problem? How could the code be
> improved to avoid those hangs altogether, or at least to make them easier
> to diagnose?

Thanks for details of your problem. Unfortunately, I have currently no idea.
Non-cygwin process is now treated very different from cygwin process because
it of course does not use cygwin functionality. For example:
(1) Cygwin setups pseudo-console for non-cygwin process in pty. Because pty is
just a pipe for non-cygwin program and some programs do not work correctly
on the pipe.
(2) Cygwin restores the default console mode for non-cygwin process if it is
running in console. This is because, cygwin uses xterm compatible mode
for cygwin process which is enabled by ENABLE_VIRTUAL_TYERMINAL_PROCESSING
and ENABLE_VIRTUAL_TERMINAL_INPUT.
However, less cygwin users runs non-cygwin process in cygwin than MSYS2 users.
MSYS2 users uses non-MSYS2 program under MSYS2 environment on a daily basis,
so probability to hit a bug of (1) and (2) is relatively higher than cygwin
users.

If you attached gdb to non-msys2 program and failed to retrieve meaningfull
information, how about attaching gdb to the stub process? Note that the stub
process is hidden from normal cygwin tools such as 'ps'. However, it is real
parent process of the non-msys2 program and you can find it in taskmanager.

-- 
Takashi Yano 


Re: [PATCH] Cygwin: console: Avoid slipping past disable_master_thread check.

2024-02-03 Thread Jeremy Drake via Cygwin-patches
Thanks for taking the time to write this up, this issue has been bugging
me for years... (see also:
https://cygwin.com/pipermail/cygwin-patches/2021q4/011638.html)

On Sat, 3 Feb 2024, Johannes Schindelin wrote:

> Concretely, the hangs occur typically when some `pacman` process (a
> package manager using the MSYS2 runtime, i.e. the Cygwin runtime with
> several dozen patches on top) calls a few non-Cygwin processes. Those
> processes seem to succeed, but there is an extra `pacman` process left
> hanging around (reported using the same command-line as its parent
> process, indicating a `fork()`ed child process or maybe that
> signal-handler that is spawned for every non-Cygwin child process) and at
> that point the program hangs indefinitely (or at least until the GitHub
> workflow run times out after 6 hours).

I don't think this requires running non-Cygwin children - I see this most
often when pacman is using GPGME to validate signatueres.  That
fork/exec's (Cygwin) gpg.

> I was not able to obtain any helpful stacktraces, they all seem to make no
> sense, I only vaguely remember that one thread was waiting for an object,
> but that could be a false flag.

My recollection when I tried to debug was that every debugger I tried got
an error trying to get the context of the main thread of the hung process.
There was another thread, which seemed to be for Cygwin signal handling,
which was blithely waiting for an object, but I kind of expect that's what
it was supposed to be doing.

> Stopping those hanging `pacman` processes via `wmic process ... delete`
> counter-intuitively fails to result in `pacman` to exit with a non-zero
> exit code. Instead, the program now runs to completion successfully!

> Do you have any idea what the bug could be? Or how I could diagnose this
> better? Attaching via `gdb` only produces unhelpful stacktraces (that may
> even be bogus, by the looks of it). Or do you think that your patch that I
> am replying to could potentially fix this problem? How could the code be
> improved to avoid those hangs altogether, or at least to make them easier
> to diagnose?