Re: URI replacement pseudocode

Mark J. Reed Mon, 17 May 2010 12:29:29 -0700

On Mon, May 17, 2010 at 3:00 PM, Aaron Sherman <a...@ajs.com> wrote:
> FFFE and FEFF are used to manage byte-ordering, so they really shouldn't be
> part of a URI (URIs should exist in a context in which byte ordering is
> assured, would be my take).


Neither U+FFFE nor U+FFFF is a valid character, but  U+FEFF is
perfectly cromulent, if deprecated: it's the ZERO-WIDTH NON-BREAKING
SPACE (U+200C ZERO WIDTH NON-JOINER is the modern replacement).   The
choice of byte-order mark protocol was well-considered: if U+FEFFis
interpreted as a character instead of a BOM, it's a pretty harmless
character.

> The Unicode spec says that FFFF is guaranteed not to be a valid Unicode
> character, but does not explain why. [
> http://unicode.org/charts/PDF/UFFF0.pdf]

The Unicode specification is a lot more than code charts.  See section
15.8, "Noncharacters", for discussion of these code points.  FFFF (and
U+xFFFF for all valid values of x up through 0x10) are invalid so they
can be used as sentinel values within application memory, for
instance.  Whereas U+FFFE is illegal precisely because it's the
inverse of the BOM.

-- 
Mark J. Reed <markjr...@gmail.com>

Re: URI replacement pseudocode

Reply via email to