Martin v. Löwis <mar...@v.loewis.de> added the comment: > The logic suggested by Martin in msg120018 looks right to me, but the > whole code seems to be unnecessarily complex. (And comb1==comb may > need to be changed to comb1>=comb.) I don't understand why linear > search through "skipped" array is needed. At the very least instead > of adding their positions to the "skipped" list, used combining > characters can be replaced by a non-character to be later skipped.
The skipped array keeps track of what characters have been integrated into a base character, as they must not appear in the output. Assume you have a sequence B,C,N,C,N,B (B: base character, C: combined, N: not combined). You need to remember not to output C, whereas you still need to output N. I don't think replacing them with a non-character can work: which one would you chose (that cannot also appear in the input)? The worst case (wrt. cskipped) is the maximum number of characters that can get combined into a single base character. It used to be (and I hope still is) 20 (decomposition of U+FDFA). ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue10254> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com