Hello Unicode Experts! Suppose an application splits a UTF-8 multi-octet sequence. The application then sends the split sequence to a client. The client must restore the original sequence.
Question: is it possible to split a UTF-8 multi-octet sequence in such a way that the client cannot unambiguously restore the original sequence? Here is the source of my question: The iCalendar specification [RFC 5545] says that long lines must be folded: Long content lines SHOULD be split into a multiple line representations using a line "folding" technique. That is, a long line can be split between any two characters by inserting a CRLF immediately followed by a single linear white-space character (i.e., SPACE or HTAB). The RFC says that, when parsing a content line, folded lines must first be unfolded using this technique: Unfolding is accomplished by removing the CRLF and the linear white-space character that immediately follows. The RFC acknowledges that simple implementations might generate improperly folded lines: Note: It is possible for very simple implementations to generate improperly folded lines in the middle of a UTF-8 multi-octet sequence. For this reason, implementations need to unfold lines in such a way to properly restore the original sequence. Can you provide an example of folding a UTF-8 multi-octet sequence such that there is no unambiguous way to restore the original sequence? /Roger