UTF-8 bookmark title is not accepted occasionally after searching UTF-8 string. Attached patch-a or patch-b fixes this problem. (I noticed this problem some years ago.)
* How to reproduce: lynx -display_charset=utf-8 -cmd_script=./cmd_log ./bkm.html Commands in cmd_log file: + Input /α^J to search "α" (UTF-8 string). + Input al^J to bookmark with title "α" (UTF-8) from the link name. "Title:" prompt repeats even after typing Enter key (^J), if the bookmark title has UTF-8 8-bit bytes only. (Adding some ASCII bytes avoids repeating prompts.) ---cmd_log file key / key 0xce key 0xb1 key ^J key a key l key ^J --- ---bkm.html <html><head><meta charset="utf-8"></head><body> <a href="bkm.html">α</a><br>β</body></html> --- * Cause: UPPER8() used from the search command (/) adds partial UTF-8 bytes to the internal state of UCTransToUni(). (*Appendix A) This behavior breaks the UTF-8 multi-byte sequence in UCTransToUni()'s internal state. "Title:" prompt on saving a bookmark is repeated while the bookmark title has no visible characters. (*Appendix B) Lynx checks the bookmark title visibility by havevisible() that uses UCTransToUni(). If the bookmark title has UTF-8 8-bit bytes only and partial UTF-8 bytes of the search string remain in the internal state of UCTransToUni(), UCTransToUni() returns ucNeedMore almost every time (then havevisible() returns FALSE). * Patch: Patch-a or patch-b fixes this problem. (diff from lynx2.9.0) + Patch-b resets the internal state of UCTransToUni() before using it for bookmark title check. + Patch-a fixes the DisplayCharsetMatchLocale value to be set as expected for UTF-8. This will prevent UPPER8() from calling UCTransToUni() for UTF-8. (unless the FORCE_8BIT_TOUPPER option is set to TRUE in lynx.cfg.) (Without patch-a, the condition of the if statement always becomes TRUE.) * Patch-b: --- LYBookmark.c.orig 2023-01-08 01:09:53.000000000 +0900 +++ LYBookmark.c 2023-10-15 10:49:46.173275705 +0900 @@ -1029,6 +1029,7 @@ static BOOLEAN havevisible(const char *T unsigned char c; long unicode; + UCTransToUni(0, -1); /* reset internal state */ for (; *p; p++) { c = UCH(TOASCII(*p)); if (c > 32 && c < 127) { * Patch-a: --- LYCharSets.c.orig 2021-06-30 07:01:12.000000000 +0900 +++ LYCharSets.c 2024-01-30 21:11:04.235975860 +0900 @@ -608,8 +608,8 @@ static void HTMLSetDisplayCharsetMatchLo DisplayCharsetMatchLocale = TRUE; /* old-style */ return; - } else if (strncasecomp(LYCharSet_UC[i].MIMEname, "cp", 2) || - strncasecomp(LYCharSet_UC[i].MIMEname, "windows", 7)) { + } else if (!strncasecomp(LYCharSet_UC[i].MIMEname, "cp", 2) || + !strncasecomp(LYCharSet_UC[i].MIMEname, "windows", 7)) { /* * Assume dos/windows displays usually on remote terminal, hence it * rarely matches locale. (In fact, MS Windows codepoints locale are * Appendix A: Trace of LYno_attr_mbcs_case_strstr(haystack="β", needle="α", utf_flag=1,): Input data: haystack("β"): ce b2 needle("α"): ce b1 [1] [0] # UPPER8(*haystack, *needle) UPPER8(0xce, 0xce) # refptr=haystack+1;tstptr=needle+1;UPPER8(*refptr, *tstptr) UPPER8(0xb2, 0xb1) [0] -> UCTransToUni(ch2=0xb1) # haystack++;UPPER8(*haystack, *needle) UPPER8(0xb2, 0xce) [1] -> UCTransToUni(ch2=0xce) UCTransToUni buffer (broken UTF-8 sequence): [0] [1] b1 ce LYStrings.c:LYno_attr_mbcs_case_strstr() for (; *haystack != '\0' && (result == NULL); haystack++) { if (... (0 == UPPER8(*haystack, *needle))) { ... refptr = (haystack + 1); tstptr = (needle + 1); ... while (1) { ... } else if (0 != UPPER8(*refptr, *tstptr)) { break; } LYStrings.c:UPPER8() if (ch1 == ch2) { ... } else if (UCH(TOASCII(ch1)) > 127 && UCH(TOASCII(ch2)) > 127) { if (DisplayCharsetMatchLocale) { result = (TOUPPER(ch1) - TOUPPER(ch2)); } else { long uni_ch2 = UCTransToUni((char) ch2, current_char_set); * Appendix B: LYBookmark.c:save_bookmark_link() do { ... LYMBM_statusline(TITLE_PROMPT); LYgetBString(&string_data, FALSE, 0, NORECALL); ... } while (!havevisible(string_data->str)); LYBookmark.c:havevisible() BOOLEAN result = FALSE; for (; *p; p++) { ... unicode = UCTransToUni(*p, current_char_set); if (unicode == ucNeedMore) continue; ... } return (result);