On Jun 28, 2011, at 10:43 AM, Guy Harris wrote: > On Jun 28, 2011, at 10:27 AM, Guy Harris wrote: > >> when putting them into a textual representation of the protocol tree or >> into columns or something else to be shown to humans, map them to UTF-8, >> with anything that can't be mapped to UTF-8 - including, if the encoding is >> putatively UTF-8, octet sequences that aren't valid UTF-8 sequences - shown >> as the Unicode replacement character U+FFFD; > > ...and, for "for display" conversions, we might want to convert control > characters to "Control Pictures" symbols (0x0000 to 0x001F convert to 0x2400 > to 0x241f: ␀, ␁, etc. through ␟; 0x007F converts to 0x2421, i.e. ␡ - in the > font in which this message is being displayed to me, those have the control > character abbreviations displayed in really really small letters, diagonally > from upper left to lower right; unfortunately, I see nothing for C1 control > characters).
http://en.wikipedia.org/wiki/Template:Unicode_chart_Control_Pictures That claims that this is "as of Unicode 6.0", so, if true, either they have a different name for control pictures for C1 control characters or there aren't any. (I have no idea what those other symbols are doing in there.) U+FFFD is often shown as a white question mark inside a black diamond: http://en.wikipedia.org/wiki/Specials_(Unicode_block) Oh, and if we're going to be extremely completist, there are the EBCDIC control characters, for which there are not always control pictures; see table 5.1: ftp://kermit.columbia.edu/kermit/ucsterminal/control.txt This was from 1998. I don't know whether any of the proposals were accepted. ___________________________________________________________________________ Sent via: Wireshark-dev mailing list <wireshark-dev@wireshark.org> Archives: http://www.wireshark.org/lists/wireshark-dev Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev mailto:wireshark-dev-requ...@wireshark.org?subject=unsubscribe