Re: [PATCH] Cygwin: pty: Add workaround for ISO-2022 and ISCII in convert_mb_str().

Thomas Wolff Fri, 11 Sep 2020 08:11:20 -0700

Am 11.09.2020 um 16:06 schrieb Corinna Vinschen:

On Sep 11 21:35, Takashi Yano via Cygwin-patches wrote:

Hi Corinna,


On Fri, 11 Sep 2020 14:08:40 +0200
Corinna Vinschen wrote:

On Sep 11 19:54, Takashi Yano via Cygwin-patches wrote:

- In convert_mb_str(), exclude ISO-2022 and ISCII from the processing
   for the case that the multibyte char is splitted in the middle.
   The reason is as follows.
   * ISO-2022 is too complicated to handle correctly.
   * Not sure what to do with ISCII.
---
  winsup/cygwin/fhandler_tty.cc | 9 +++++++--
  1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc
index 37d033bbe..ee5c6a90a 100644
--- a/winsup/cygwin/fhandler_tty.cc
+++ b/winsup/cygwin/fhandler_tty.cc
@@ -117,6 +117,9 @@ CreateProcessW_Hooked
    return CreateProcessW_Orig (n, c, pa, ta, inh, f, e, d, si, pi);
  }

+#define IS_ISO_2022(x) ( (x) >= 50220 && (x) <= 50229 )

+#define IS_ISCII(x) ( (x) >= 57002 && (x) <= 57011 )
+
  static void
  convert_mb_str (UINT cp_to, char *ptr_to, size_t *len_to,
                UINT cp_from, const char *ptr_from, size_t len_from,
@@ -126,8 +129,10 @@ convert_mb_str (UINT cp_to, char *ptr_to, size_t *len_to,
    tmp_pathbuf tp;
    wchar_t *wbuf = tp.w_get ();
    int wlen = 0;
-  if (cp_from == CP_UTF7)
-    /* MB_ERR_INVALID_CHARS does not work properly for UTF-7.
+  if (cp_from == CP_UTF7 || IS_ISO_2022 (cp_from) || IS_ISCII (cp_from))
+    /* - MB_ERR_INVALID_CHARS does not work properly for UTF-7.
+       - ISO-2022 is too complicated to handle correctly.
+       - FIXME: Not sure what to do for ISCII.
         Therefore, just convert string without checking */
      wlen = MultiByteToWideChar (cp_from, 0, ptr_from, len_from,
                                wbuf, NT_MAX_PATH);
--
2.28.0

I'd prefer to not handle them at all.  We just don't support these
charsets, same as JIS, EBCDIC, you name it, which are not ASCII
compatible.  Let's please just drop any handling for these weird
or outdated codepages.

What do you mean by "just drop any handling"?

Do you mean remove following if block?

+  if (cp_from == CP_UTF7 || IS_ISO_2022 (cp_from) || IS_ISCII (cp_from))
+    /* - MB_ERR_INVALID_CHARS does not work properly for UTF-7.
+       - ISO-2022 is too complicated to handle correctly.
+       - FIXME: Not sure what to do for ISCII.
         Therefore, just convert string without checking */
      wlen = MultiByteToWideChar (cp_from, 0, ptr_from, len_from,
                                wbuf, NT_MAX_PATH);

In this case, the conversion for ISO-2022, ISCII and UTF-7 will
not be done correctly.

Or skip charset conversion if the codepage is EBCDIC, ISO-2022
or ISCII? What should we do for UTF-7?

Nothing, just like for any other of these weird charsets.  Cygwin never
supported any charset which wasn't at least ASCII compatible in the
0 <= x <= 127 range.

Actually, in Shift-JIS (CP932, supported via locale ja_JP.sjis), 0x5C is¥ :/

   Just ignore them and the possibility that a
user chooses them for fun.

What should happen if user or apps chage codepage to one of them?

Garbage output, I guess.  We shouldn't really care.


Corinna

Re: [PATCH] Cygwin: pty: Add workaround for ISO-2022 and ISCII in convert_mb_str().

Reply via email to