[Daniel Shahaf] > The current patch's docstring implies the LF byte is necessarily part > of a line terminator, which is true for UTF-8/16/32 but not > necessarily true in arbitrary encodings.
Nitpick: It is true in UTF-8, but not -16 or -32. There are about 70 characters in the BMP which, in UTF-16LE (and -32LE), begin with 0A: $ grep '^..0A;' /usr/share/misc/unicode.gz | head 000A;<control>;Cc;0;B;;;;;N;LINE FEED (LF);;;; 010A;LATIN CAPITAL LETTER C WITH DOT ABOVE;Lu;0;L;0043 0307;;;;N;LATIN CAPITAL LETTER C DOT;;;010B; 020A;LATIN CAPITAL LETTER I WITH INVERTED BREVE;Lu;0;L;0049 0311;;;;N;;;;020B; 030A;COMBINING RING ABOVE;Mn;230;NSM;;;;;N;NON-SPACING RING ABOVE;;;; 040A;CYRILLIC CAPITAL LETTER NJE;Lu;0;L;;;;;N;;;;045A; 050A;CYRILLIC CAPITAL LETTER KOMI NJE;Lu;0;L;;;;;N;;;;050B; 060A;ARABIC-INDIC PER TEN THOUSAND SIGN;Po;0;ET;;;;;N;;;;; 070A;SYRIAC CONTRACTION;Po;0;AL;;;;;N;;;;; 080A;SAMARITAN LETTER KAAF;Lo;0;R;;;;;N;;;;; 090A;DEVANAGARI LETTER UU;Lo;0;L;;;;;N;;;;;