Re: [OAUTH-WG] Language encoding in error_description

Kris Selden Thu, 19 May 2011 02:14:48 -0700

> Well, like it or not, the default for HTTP header fields is not UTF-8.


Encoding in HTTP header fields is not the topic, error_description is already 
encoded into a URI before it is in the Location field. 

There are 3 spots where error_description appears:
http://tools.ietf.org/html/draft-ietf-oauth-v2-16#section-4.1.2.1
http://tools.ietf.org/html/draft-ietf-oauth-v2-16#section-4.2.2.1
http://tools.ietf.org/html/draft-ietf-oauth-v2-16#section-5.2

In section 4.1.2.1 and 4.2.2.1 the issue is about character encoding before 
application/x-www-form-urlencoded encoding (after that it is ASCII only). In 
section 4.2.2.1, the parameter is encoded in the fragment component which is 
only visible on the client side, and likely to be read by a script in 
Javascript (which is unicode only).

In section 5.2 the response type is JSON which already deals with character 
encoding (http://tools.ietf.org/html/rfc4627#section-3) and is Unicode only.  
So there isn't anything to solve for error_description in section 5.2, except 
maybe to reference section 3 of rfc4627.

 
Proposal for sections 4.1.2.1 and 4.2.2.1:

error_description
         OPTIONAL.  A human-readable text providing additional
         information, used to assist in the understanding and resolution
         of the error occurred.  The text should first be encoded as
         octets according to the UTF-8 character encoding before being
         encoded using the "application/x-www-form-urlencoded" format.

Examples:
HTTP/1.1 302 Found
Location: 
https://client.example.com/cb?error=access_denied&error_description=Acc%C3%A8s+refus%C3%A9

HTTP/1.1 302 Found
Location: 
https://client.example.com/cb#error=access_denied&error_description=Acc%C3%A8s%20refus%C3%A9


Proposal for section 5.2:

error_description
         OPTIONAL.  A human-readable text providing additional
         information, used to assist in the understanding and resolution
         of the error occurred.  The text shall be encoded in Unicode
         as defined by [RFC4627].

Example:
HTTP/1.1 400 Bad Request
Content-Type: application/json
Cache-Control: no-store

{
  "error":"invalid_request",
  "error_description":"Accès refusé"
}


For query strings encoded with application/x-www-form-urlencoded the most 
common default is UTF-8, while a response body encoded with 
application/x-www-form-urlencoded should set a charset parameter in the 
Content-Type header.

Here are examples of dealing with query strings in a few languages and app 
frameworks:

Javascript (very relevant to section 4.2.2.1)
> decodeURIComponent("Acc%C3%A8s%20refus%C3%A9") // UTF-8
'Accès refusé'

http://www.ecmascript.org/docs.php
> The decodeURIComponent function computes a new version of a URI in which each 
> escape
> sequence and UTF-8 encoding of the sort that might be introduced by the 
> encodeURIComponent
> function is replaced with the character that it represents.


NodeJS
node
> var querystring = require('querystring')
> querystring.parse('error=access_denied&error_description=Acc%C3%A8s+refus%C3%A9')
>  // UTF-8
{ error: 'access_denied', error_description: 'Accès refusé' }

.Net
Request.QueryString     // UTF-8
HttpUtility.ParseQueryString(String)  // UTF-8
HttpUtility.ParseQueryString(String, Encoding) // Need to know the encoding 
before the query string is parsed

Ruby
# Rack 3 only parses query strings as UTF-8 but older versions use binary 
strings
Rack::Request.params
URI.decode_www_form_component(str, enc=Encoding::UTF_8)

Python (binary string)
python
>>> from urlparse import parse_qs
>>> parse_qs("error=access_denied&error_description=Acc%C3%A8s+refus%C3%A9")
{'error_description': ['Acc\xc3\xa8s refus\xc3\xa9'], 'error': 
['access_denied']}

PHP (binary string)
php -r 
'parse_str("error=access_denied&error_description=Acc%C3%A8s+refus%C3%A9", 
$output); print_r($output);'
Array
(
    [error] => access_denied
    [error_description] => Accès refusé
)

Java
ServletRequest.getParameter(String name) // Tomcat has 2 settings which govern 
query string parsing URIEncoding which defaults to ISO-8859-1 and 
useBodyEncodingForURI which defaults to false
URLDecoder.decode(String s, String enc) // Need to know encoding before percent 
decoding

On May 18, 2011, at 10:48 PM, Julian Reschke wrote:

> On 2011-05-19 01:24, Kris Selden wrote:
>> Is there a problem with sticking to UTF-8? OAuth already mandates JSON which 
>> is Unicode only.
> 
> 
> 
>> Would be nice to keep it simple.
>> 
>> I'm guessing without guidance, most would convert to UTF-8 and percent 
>> encode anyway.
> 
> Without guidance, people usually do not encode at all, and we'll see 
> different encodings on the wire.
> 
> Best regards, Julian

_______________________________________________
OAuth mailing list
OAuth@ietf.org
https://www.ietf.org/mailman/listinfo/oauth

Re: [OAUTH-WG] Language encoding in error_description

Reply via email to