On 2012-07-09 17:01, Julian Reschke wrote:
On 2012-07-09 16:48, Mike Jones wrote:
HTML5 is not cited because it's a working draft - not an approved
standard. In what way is "the definition of the media type in HTML4
is known to be insufficient"? People have been successfully
implementing form-urlencoding with it for quite some time. :-) Is
there a specific wording change that you'd suggest that we make that
doesn't involve citing a working draft, rather than an approved standard?
For instance, the HTML4 "definition" doesn't even mention what to do
with non-ASCII characters.
I understand that it's not particularly attractive, but citing HTML4
just because it's a "standard" isn't really helpful for people who
actually follow the link and try to understand what needs to be
implemented.
...
Here's an attempt to describe the encoding in terms of HTML4, plus
additional instruction. This would need to be referenced anyway where
the spec currently refers to the HTML4 media type definition:
-- snip --
Appendix X. Use of the application/x-www-form-urlencoded Media Type
At the time of publication of this specification, the
"application/x-www-form-urlencoded" media type was defined in Section
17.13.4 of [HTML4], but not registered in the IANA media types registry
(<http://www.iana.org/assignments/media-types/index.html>). Furthermore,
the definition is incomplete as it does not consider non-US-ASCII
characters.
To address this shortcoming, when generating payloads using this media
type, names and values MUST be encoded using the "UTF-8" character
encoding scheme ([RFC3629]) first; the resulting octet sequence then
needs to be further encoded using the escaping rules defined in [HTML4].
When parsing data from a payload using this media type, the names and
values resulting from reversing the name/value encoding consequently
need to be treated as octet sequences, to be decoded using the "UTF-8"
character encoding scheme.
Example: A value consisting of the six Unicode code points (1) U+0020
(SPACE), (2) U+0025 (PERCENT SIGN), (3) U+0026 (AMPERSAND), (4) U+002B
(PLUS SIGN), (5) U+00A3 (POUND SIGN), and (6) U+20AC (EURO SIGN) would
be encoded into the octet sequence below (using hexadecimal notation):
20 25 26 2B C2 A3 E2 82 AC
and then represented in the payload as:
+%25%26%2B%C2%A3%E2%82%AC
-- snip --
Best regards, Julian
_______________________________________________
OAuth mailing list
OAuth@ietf.org
https://www.ietf.org/mailman/listinfo/oauth