> Well, like it or not, the default for HTTP header fields is not UTF-8.
Encoding in HTTP header fields is not the topic, error_description is already encoded into a URI before it is in the Location field. There are 3 spots where error_description appears: http://tools.ietf.org/html/draft-ietf-oauth-v2-16#section-4.1.2.1 http://tools.ietf.org/html/draft-ietf-oauth-v2-16#section-4.2.2.1 http://tools.ietf.org/html/draft-ietf-oauth-v2-16#section-5.2 In section 4.1.2.1 and 4.2.2.1 the issue is about character encoding before application/x-www-form-urlencoded encoding (after that it is ASCII only). In section 4.2.2.1, the parameter is encoded in the fragment component which is only visible on the client side, and likely to be read by a script in Javascript (which is unicode only). In section 5.2 the response type is JSON which already deals with character encoding (http://tools.ietf.org/html/rfc4627#section-3) and is Unicode only. So there isn't anything to solve for error_description in section 5.2, except maybe to reference section 3 of rfc4627. Proposal for sections 4.1.2.1 and 4.2.2.1: error_description OPTIONAL. A human-readable text providing additional information, used to assist in the understanding and resolution of the error occurred. The text should first be encoded as octets according to the UTF-8 character encoding before being encoded using the "application/x-www-form-urlencoded" format. Examples: HTTP/1.1 302 Found Location: https://client.example.com/cb?error=access_denied&error_description=Acc%C3%A8s+refus%C3%A9 HTTP/1.1 302 Found Location: https://client.example.com/cb#error=access_denied&error_description=Acc%C3%A8s%20refus%C3%A9 Proposal for section 5.2: error_description OPTIONAL. A human-readable text providing additional information, used to assist in the understanding and resolution of the error occurred. The text shall be encoded in Unicode as defined by [RFC4627]. Example: HTTP/1.1 400 Bad Request Content-Type: application/json Cache-Control: no-store { "error":"invalid_request", "error_description":"Accès refusé" } For query strings encoded with application/x-www-form-urlencoded the most common default is UTF-8, while a response body encoded with application/x-www-form-urlencoded should set a charset parameter in the Content-Type header. Here are examples of dealing with query strings in a few languages and app frameworks: Javascript (very relevant to section 4.2.2.1) > decodeURIComponent("Acc%C3%A8s%20refus%C3%A9") // UTF-8 'Accès refusé' http://www.ecmascript.org/docs.php > The decodeURIComponent function computes a new version of a URI in which each > escape > sequence and UTF-8 encoding of the sort that might be introduced by the > encodeURIComponent > function is replaced with the character that it represents. NodeJS node > var querystring = require('querystring') > querystring.parse('error=access_denied&error_description=Acc%C3%A8s+refus%C3%A9') > // UTF-8 { error: 'access_denied', error_description: 'Accès refusé' } .Net Request.QueryString // UTF-8 HttpUtility.ParseQueryString(String) // UTF-8 HttpUtility.ParseQueryString(String, Encoding) // Need to know the encoding before the query string is parsed Ruby # Rack 3 only parses query strings as UTF-8 but older versions use binary strings Rack::Request.params URI.decode_www_form_component(str, enc=Encoding::UTF_8) Python (binary string) python >>> from urlparse import parse_qs >>> parse_qs("error=access_denied&error_description=Acc%C3%A8s+refus%C3%A9") {'error_description': ['Acc\xc3\xa8s refus\xc3\xa9'], 'error': ['access_denied']} PHP (binary string) php -r 'parse_str("error=access_denied&error_description=Acc%C3%A8s+refus%C3%A9", $output); print_r($output);' Array ( [error] => access_denied [error_description] => Accès refusé ) Java ServletRequest.getParameter(String name) // Tomcat has 2 settings which govern query string parsing URIEncoding which defaults to ISO-8859-1 and useBodyEncodingForURI which defaults to false URLDecoder.decode(String s, String enc) // Need to know encoding before percent decoding On May 18, 2011, at 10:48 PM, Julian Reschke wrote: > On 2011-05-19 01:24, Kris Selden wrote: >> Is there a problem with sticking to UTF-8? OAuth already mandates JSON which >> is Unicode only. > > > >> Would be nice to keep it simple. >> >> I'm guessing without guidance, most would convert to UTF-8 and percent >> encode anyway. > > Without guidance, people usually do not encode at all, and we'll see > different encodings on the wire. > > Best regards, Julian _______________________________________________ OAuth mailing list OAuth@ietf.org https://www.ietf.org/mailman/listinfo/oauth