Hi again, On Wed, Feb 10, 2021 at 11:01:58PM -0000, Tavis Ormandy wrote: > On 2021-02-10, Axel Beckert wrote: > > + else if (i < sizeof combchars / sizeof *combchars) { > > This doesn't seem right, I think it should be compared against the > calloc param at the top of utf8_handle_comb(), but I don't really > understand enough about unicode to know where that 0x802 comes from!
Ack, I seem to have missed one level of dereference at least, so the calculated size is probably too small. > --- encoding.c 2020-02-05 12:09:38.000000000 -0800 > +++ encoding.c 2021-02-10 15:00:05.000000000 -0800 > @@ -1357,6 +1357,9 @@ > int root, i, c1; > int isdouble; > > + if (c > 0x801) > + return; > + > c1 = mc->image | (mc->font << 8) | mc->fontx << 16; > isdouble = c1 >= 0x1100 && utf8_isdouble(c1); > if (!combchars) While that fix indeed fixes the crash as did mine (probably by accident :-), I in the meanwhile found rendering issue with both: I currently assume that this code handles combining diacriticals, i.e. unicode characters which modify the previous character. Since they can be stacked and Tavis mentioned that he thinks the code only handles UTF-8 characters with (max) two bytes, I toyed around with multiple combining diacriticals in a row. (Yes, I'm aware that these are not "more than two bytes" with regards to that limit mentioned above.) I found that without any patch, screen rendered the combination of e.g. "e̤̒" * the ASCII letter "e", and * U+0324 COMBINING DIAERESIS BELOW (size is two bytes in UTF-8) * U+0312 COMBINING TURNED COMMA ABOVE (size is two bytes in UTF-8) correctly. With both, Tavis as well as my patch, only U+0324 COMBINING DIAERESIS BELOW is rendered and U+0312 COMBINING TURNED COMMA ABOVE is not shown. Then again, "l᪼" (ASCII "l" + U+1ABC COMBINING DOUBLE PARENTHESES ABOVE which has a three bytes representation in UTF-8 and clearly above 0x800) is rendered correctly without patch as well with your patch and mine. The same counts for "e𝆫" (ASCII "e" and U+1D1AB MUSICAL SYMBOL COMBINING UP BOW which has a four bytes representation in UTF-8). Will test Michael's second patch proposal later today. Looking very forward to that. :-) Kind regards, Axel -- PGP: 2FF9CD59612616B5 /~\ Plain Text Ribbon Campaign, http://arc.pasp.de/ Mail: a...@deuxchevaux.org \ / Say No to HTML in E-Mail and Usenet Mail+Jabber: a...@noone.org X https://axel.beckert.ch/ / \ I love long mails: https://email.is-not-s.ms/