UTF-8 bookmark title is not accepted occasionally
after searching UTF-8 string.
Attached patch-a or patch-b fixes this problem.
(I noticed this problem some years ago.)

* How to reproduce:
lynx -display_charset=utf-8 -cmd_script=./cmd_log ./bkm.html

Commands in cmd_log file:
 + Input /α^J to search "α" (UTF-8 string).
 + Input al^J to bookmark with title "α" (UTF-8) from the link name.

"Title:" prompt repeats even after typing Enter key (^J),
if the bookmark title has UTF-8 8-bit bytes only.
(Adding some ASCII bytes avoids repeating prompts.)

---cmd_log file
key /
key 0xce
key 0xb1
key ^J
key a
key l
key ^J
---

---bkm.html
<html><head><meta charset="utf-8"></head><body>
<a href="bkm.html">α</a><br>β</body></html>
---

* Cause:
UPPER8() used from the search command (/) adds partial UTF-8 bytes
to the internal state of UCTransToUni().  (*Appendix A)
This behavior breaks the UTF-8 multi-byte sequence
in UCTransToUni()'s internal state.

"Title:" prompt on saving a bookmark is repeated
while the bookmark title has no visible characters.  (*Appendix B)
Lynx checks the bookmark title visibility by havevisible()
that uses UCTransToUni().
If the bookmark title has UTF-8 8-bit bytes only
and partial UTF-8 bytes of the search string remain
in the internal state of UCTransToUni(),
UCTransToUni() returns ucNeedMore almost every time
(then havevisible() returns FALSE).

* Patch:
Patch-a or patch-b fixes this problem.
(diff from lynx2.9.0)

+ Patch-b resets the internal state of UCTransToUni() before using it
  for bookmark title check.
+ Patch-a fixes the DisplayCharsetMatchLocale value to be set
  as expected for UTF-8.
  This will prevent UPPER8() from calling UCTransToUni() for UTF-8.
  (unless the FORCE_8BIT_TOUPPER option is set to TRUE in lynx.cfg.)
(Without patch-a, the condition of the if statement always becomes TRUE.)

* Patch-b:
--- LYBookmark.c.orig   2023-01-08 01:09:53.000000000 +0900
+++ LYBookmark.c        2023-10-15 10:49:46.173275705 +0900
@@ -1029,6 +1029,7 @@ static BOOLEAN havevisible(const char *T
     unsigned char c;
     long unicode;
 
+    UCTransToUni(0, -1); /* reset internal state */
     for (; *p; p++) {
        c = UCH(TOASCII(*p));
        if (c > 32 && c < 127) {

* Patch-a:
--- LYCharSets.c.orig   2021-06-30 07:01:12.000000000 +0900
+++ LYCharSets.c        2024-01-30 21:11:04.235975860 +0900
@@ -608,8 +608,8 @@ static void HTMLSetDisplayCharsetMatchLo
        DisplayCharsetMatchLocale = TRUE;       /* old-style */
        return;
 
-    } else if (strncasecomp(LYCharSet_UC[i].MIMEname, "cp", 2) ||
-              strncasecomp(LYCharSet_UC[i].MIMEname, "windows", 7)) {
+    } else if (!strncasecomp(LYCharSet_UC[i].MIMEname, "cp", 2) ||
+              !strncasecomp(LYCharSet_UC[i].MIMEname, "windows", 7)) {
        /*
         * Assume dos/windows displays usually on remote terminal, hence it
         * rarely matches locale.  (In fact, MS Windows codepoints locale are

* Appendix A:
Trace of LYno_attr_mbcs_case_strstr(haystack="β", needle="α", utf_flag=1,):

Input data:
  haystack("β"): ce b2

  needle("α"):   ce b1
                [1] [0]

# UPPER8(*haystack, *needle)
UPPER8(0xce, 0xce)
# refptr=haystack+1;tstptr=needle+1;UPPER8(*refptr, *tstptr)
UPPER8(0xb2, 0xb1)
             [0] -> UCTransToUni(ch2=0xb1)
# haystack++;UPPER8(*haystack, *needle)
UPPER8(0xb2, 0xce)
             [1] -> UCTransToUni(ch2=0xce)

UCTransToUni buffer (broken UTF-8 sequence):
[0] [1]
b1  ce

LYStrings.c:LYno_attr_mbcs_case_strstr()
        for (; *haystack != '\0' && (result == NULL); haystack++) {
            if (...
                (0 == UPPER8(*haystack, *needle))) {
                ...
                refptr = (haystack + 1);
                tstptr = (needle + 1);
                ...
                while (1) {
                        ...
                        } else if (0 != UPPER8(*refptr, *tstptr)) {
                            break;
                        }

LYStrings.c:UPPER8()
    if (ch1 == ch2) {
    ...
    } else if (UCH(TOASCII(ch1)) > 127 &&
               UCH(TOASCII(ch2)) > 127) {
        if (DisplayCharsetMatchLocale) {
            result = (TOUPPER(ch1) - TOUPPER(ch2));
        } else {
            long uni_ch2 = UCTransToUni((char) ch2, current_char_set);

* Appendix B:
LYBookmark.c:save_bookmark_link()
    do {
        ...
        LYMBM_statusline(TITLE_PROMPT);
        LYgetBString(&string_data, FALSE, 0, NORECALL);
        ...
    } while (!havevisible(string_data->str));

LYBookmark.c:havevisible()
    BOOLEAN result = FALSE;
    for (; *p; p++) {
        ...
        unicode = UCTransToUni(*p, current_char_set);
        if (unicode == ucNeedMore)
            continue;
        ...
    }
    return (result);


Reply via email to