UTF-8 bookmark title is not accepted occasionally
after searching UTF-8 string.
Attached patch-a or patch-b fixes this problem.
(I noticed this problem some years ago.)
* How to reproduce:
lynx -display_charset=utf-8 -cmd_script=./cmd_log ./bkm.html
Commands in cmd_log file:
+ Input /α^J to search "α" (UTF-8 string).
+ Input al^J to bookmark with title "α" (UTF-8) from the link name.
"Title:" prompt repeats even after typing Enter key (^J),
if the bookmark title has UTF-8 8-bit bytes only.
(Adding some ASCII bytes avoids repeating prompts.)
---cmd_log file
key /
key 0xce
key 0xb1
key ^J
key a
key l
key ^J
---
---bkm.html
<html><head><meta charset="utf-8"></head><body>
<a href="bkm.html">α</a><br>β</body></html>
---
* Cause:
UPPER8() used from the search command (/) adds partial UTF-8 bytes
to the internal state of UCTransToUni(). (*Appendix A)
This behavior breaks the UTF-8 multi-byte sequence
in UCTransToUni()'s internal state.
"Title:" prompt on saving a bookmark is repeated
while the bookmark title has no visible characters. (*Appendix B)
Lynx checks the bookmark title visibility by havevisible()
that uses UCTransToUni().
If the bookmark title has UTF-8 8-bit bytes only
and partial UTF-8 bytes of the search string remain
in the internal state of UCTransToUni(),
UCTransToUni() returns ucNeedMore almost every time
(then havevisible() returns FALSE).
* Patch:
Patch-a or patch-b fixes this problem.
(diff from lynx2.9.0)
+ Patch-b resets the internal state of UCTransToUni() before using it
for bookmark title check.
+ Patch-a fixes the DisplayCharsetMatchLocale value to be set
as expected for UTF-8.
This will prevent UPPER8() from calling UCTransToUni() for UTF-8.
(unless the FORCE_8BIT_TOUPPER option is set to TRUE in lynx.cfg.)
(Without patch-a, the condition of the if statement always becomes TRUE.)
* Patch-b:
--- LYBookmark.c.orig 2023-01-08 01:09:53.000000000 +0900
+++ LYBookmark.c 2023-10-15 10:49:46.173275705 +0900
@@ -1029,6 +1029,7 @@ static BOOLEAN havevisible(const char *T
unsigned char c;
long unicode;
+ UCTransToUni(0, -1); /* reset internal state */
for (; *p; p++) {
c = UCH(TOASCII(*p));
if (c > 32 && c < 127) {
* Patch-a:
--- LYCharSets.c.orig 2021-06-30 07:01:12.000000000 +0900
+++ LYCharSets.c 2024-01-30 21:11:04.235975860 +0900
@@ -608,8 +608,8 @@ static void HTMLSetDisplayCharsetMatchLo
DisplayCharsetMatchLocale = TRUE; /* old-style */
return;
- } else if (strncasecomp(LYCharSet_UC[i].MIMEname, "cp", 2) ||
- strncasecomp(LYCharSet_UC[i].MIMEname, "windows", 7)) {
+ } else if (!strncasecomp(LYCharSet_UC[i].MIMEname, "cp", 2) ||
+ !strncasecomp(LYCharSet_UC[i].MIMEname, "windows", 7)) {
/*
* Assume dos/windows displays usually on remote terminal, hence it
* rarely matches locale. (In fact, MS Windows codepoints locale are
* Appendix A:
Trace of LYno_attr_mbcs_case_strstr(haystack="β", needle="α", utf_flag=1,):
Input data:
haystack("β"): ce b2
needle("α"): ce b1
[1] [0]
# UPPER8(*haystack, *needle)
UPPER8(0xce, 0xce)
# refptr=haystack+1;tstptr=needle+1;UPPER8(*refptr, *tstptr)
UPPER8(0xb2, 0xb1)
[0] -> UCTransToUni(ch2=0xb1)
# haystack++;UPPER8(*haystack, *needle)
UPPER8(0xb2, 0xce)
[1] -> UCTransToUni(ch2=0xce)
UCTransToUni buffer (broken UTF-8 sequence):
[0] [1]
b1 ce
LYStrings.c:LYno_attr_mbcs_case_strstr()
for (; *haystack != '\0' && (result == NULL); haystack++) {
if (...
(0 == UPPER8(*haystack, *needle))) {
...
refptr = (haystack + 1);
tstptr = (needle + 1);
...
while (1) {
...
} else if (0 != UPPER8(*refptr, *tstptr)) {
break;
}
LYStrings.c:UPPER8()
if (ch1 == ch2) {
...
} else if (UCH(TOASCII(ch1)) > 127 &&
UCH(TOASCII(ch2)) > 127) {
if (DisplayCharsetMatchLocale) {
result = (TOUPPER(ch1) - TOUPPER(ch2));
} else {
long uni_ch2 = UCTransToUni((char) ch2, current_char_set);
* Appendix B:
LYBookmark.c:save_bookmark_link()
do {
...
LYMBM_statusline(TITLE_PROMPT);
LYgetBString(&string_data, FALSE, 0, NORECALL);
...
} while (!havevisible(string_data->str));
LYBookmark.c:havevisible()
BOOLEAN result = FALSE;
for (; *p; p++) {
...
unicode = UCTransToUni(*p, current_char_set);
if (unicode == ucNeedMore)
continue;
...
}
return (result);