> On 11 Jun 2015, at 08:35, Sven Van Caekenberghe <s...@stfx.eu> wrote:
> 
> @everybody
> 
> The key method that defines how the query part of a URL is percent encoded is 
> ZnMetaResourceUtils class>>#querySafeSet
> 
> Years ago, Zinc HTTP Components followed the better safe than sorry approach 
> of encoding almost every character except for the ones that are safe in all 
> contexts.
> 
> Later on, we began reading the specs better and decided to follow them more 
> closely, that is why there are now different safe sets.
> 
> Now, we can (and should) all read the different specs, and try to learn from 
> things in the wild as well from other implementations.
> 
> The quote from http://en.wikipedia.org/wiki/Query_string was incomplete, it 
> said 'for HTML 5 when submitting a form using GET', which is a very specific 
> context.
> 
> ZnUrl was written against RFC 3986 mostly.
> 
> Now, maybe we made a mistake, maybe not.

I looked into this a bit more, and I am confused.

My most strict reading of RFC 3986 (which obsoletes RFC 2396) says in section 
3.4 Query:

query       = *( pchar / "/" / "?" )

where 

pchar       = unreserved / pct-encoded / sub-delims / ":" / "@"
unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"
sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
                  / "*" / "+" / "," / ";" / "="

which I understand to allow ,

The description above is what is in ZnMetaResourceUtils class>>#querySafeSet 
which the noted exceptions (=, & and + because we interpret the query as 
key-value pairs).

In http://www.w3.org/Addressing/URL/uri-spec.html is read the same.

That being said, there are counter examples, like when you search for foo,bar 
in Google using Google Chrome, which then results in the URL:

https://www.google.be/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=foo%2Cbar

Or when you do

$ curl -G -v --data-urlencode "foo=one,two" 
"http://zn.stfx.eu/echo?q=1,2,3&x=a,b";
* Hostname was NOT found in DNS cache
*   Trying 46.137.113.215...
* Connected to zn.stfx.eu (46.137.113.215) port 80 (#0)
> GET /echo?q=1,2,3&x=a,b&foo=one%2Ctwo HTTP/1.1
> User-Agent: curl/7.37.1
> Host: zn.stfx.eu
> Accept: */*
> 
< HTTP/1.1 200 OK
< Date: Thu, 11 Jun 2015 14:23:06 GMT
* Server Zinc HTTP Components 1.0 is not blacklisted
< Server: Zinc HTTP Components 1.0
< Content-Type: text/plain;charset=utf-8
< Content-Length: 421
< Vary: Accept-Encoding
< 
This is Zinc HTTP Components echoing your request !
Running a ZnManagingMultiThreadedServer(running 8083)
GET request for /echo?q=1,2,3&x=a,b&foo=one,two
with headers
 X-Forwarded-Server: ip-10-226-6-28.eu-west-1.compute.internal
 X-Forwarded-Host: zn.stfx.eu
 X-Zinc-Remote-Address: 127.0.0.1
 User-Agent: curl/7.37.1
 Host: localhost:8083
 Connection: Keep-Alive
 Accept: */*
 X-Forwarded-For: 81.83.7.35

* Connection #0 to host zn.stfx.eu left intact

Reading about JavaScripts' encodeURI and encodeURIComponent functions does not 
help either (the first one keeps the comma, the latter one encodes it).

I know there are some other people on this list that might have an opinion, so 
let's try to figure this out together.

> But maybe it also would be a good idea to allow users to decide this for 
> themselves on a case by case basis.
> 
>> On 11 Jun 2015, at 05:18, Jimmie Houchin <jlhouc...@gmail.com> wrote:
>> 
>> Thanks for the reply.
>> 
>> I implemented Peter's suggestion as an easy keep moving solution.
>> 
>> As I said, I am not expert in what is or is not legal according to the 
>> standards. 
>> However, looking at Python, their urllib library in the quote and urlencode 
>> methods they encode the commas by default.
>> 
>> _ALWAYS_SAFE = frozenset(b'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
>>                         b'abcdefghijklmnopqrstuvwxyz'
>>                         b'0123456789'
>>                         b'_.-')
>> 
>> https://docs.python.org/3/library/urllib.parse.html
>> https://hg.python.org/cpython/file/3.4/Lib/urllib/parse.py
>> 
>> That's at least how one major language understands the standard. And Python 
>> 2.7 is the same.
>> 
>> According to Wikipedia
>> http://en.wikipedia.org/wiki/Query_string
>> • Characters that cannot be converted to the correct charset are replaced 
>> with HTML numeric character references[9]
>> • SPACE is encoded as '+'
>> • Letters (A–Z and a–z), numbers (0–9) and the characters '*','-','.' and 
>> '_' are left as-is
>> 
>> It appeared in the stackoverflow article I quoted previously that ASP.NET 
>> encodes commas. I could misunderstand or be reading into it.
>> http://stackoverflow.com/questions/8828702/why-is-the-comma-url-encoded
>> Just a little more information to add to the discussion.
>> 
>> Thanks.
>> 
>> Jimmie
>> 
>> 
>> 
>> 
>> On 06/10/2015 05:56 PM, Norbert Hartl wrote:
>>> Just to clarify:
>>> 
>>> "
>>> Characters in the "reserved" set are not reserved in
>>>          all contexts.
>>> 
>>>   The set of characters actually reserved within any given URI
>>>   component is defined by that component. In general, a character is
>>>   reserved if the semantics of the URI changes if the character is
>>>   replaced with its escaped US-ASCII encoding."
>>> 
>>> If I were you I'd subclass ZnUrl and implement 
>>> #encodeQuery:on:
>>> on that class. You could have an extension method in ZnResourceMetaUtils 
>>> that returns the character set you need to have encoded. In ZnClient you 
>>> just set your ZnUrl derived class object as #url:
>>> Cannot think of anything better for a quick resolve of your problem.
>>> Norbert
>>>> Am 11.06.2015 um 00:26 schrieb Jimmie Houchin <jlhouc...@gmail.com>:
>>>> 
>>>> I am not an expert on URIs or encoding. However, this is a requirement of 
>>>> the API I am using and I am required to submit an encoded URI with %2C and 
>>>> no commas.
>>>> 
>>>> As far as commas needing to be escaped, it seems from other sources that 
>>>> they should be.
>>>> 
>>>> From https://www.ietf.org/rfc/rfc2396.txt
>>>> The plus "+", dollar "$", and comma "," characters have been added to
>>>>   those in the "reserved" set, since they are treated as reserved
>>>>   within the query component.
>>>> 
>>>> States that commas are reserved within the query component.
>>>> 
>>>> 
>>>> http://stackoverflow.com/questions/8828702/why-is-the-comma-url-encoded
>>>> 
>>>> 
>>>> Regardless of what is or is not required, I do need the ability to have a 
>>>> query string with commas encoded as %2C in order to satisfy and use the 
>>>> API which states.
>>>> 
>>>> fields: Optional An URL encoded (%2C) comma separated list of instrument 
>>>> fields that are to be returned in the response. The instrument field will 
>>>> be returned regardless of the input to this query parameter. Please see 
>>>> the Response Parameters section below for a list of valid values.
>>>> 
>>>> Which will look like this or something similar.
>>>> 
>>>> fields=displayName%2Cinstrument%2Cpip
>>>> 
>>>> 
>>>> Thanks.
>>>> 
>>>> Jimmie
>>>> 
>>>> 
>>>> On 06/10/2015 03:27 PM, Norbert Hartl wrote:
>>>>> That's because the comma does not need to be escaped in the query part of 
>>>>> the uri.
>>>>> 
>>>>> Norbert
>>>>> 
>>>>> 
>>>>>> Am 10.06.2015 um 22:00 schrieb Jimmie Houchin <jlhouc...@gmail.com>
>>>>>> :
>>>>>> 
>>>>>> On 06/10/2015 10:32 AM, Sven Van Caekenberghe wrote:
>>>>>> 
>>>>>>>> On 10 Jun 2015, at 17:24, David <stormb...@gmail.com>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> El Wed, 10 Jun 2015 10:14:37 -0500
>>>>>>>> Jimmie Houchin 
>>>>>>>> <jlhouc...@gmail.com>
>>>>>>>> 
>>>>>>>> escribió:
>>>>>>>> 
>>>>>>>>> Hello,
>>>>>>>>> 
>>>>>>>>> I am attempting to use ZnClient to request data. The request requires
>>>>>>>>> a %2C (comma) delimited string as part of the query. Below is a
>>>>>>>>> snippet.
>>>>>>>>> 
>>>>>>>>> znClient
>>>>>>>>>        addPath: '/v1/instruments';
>>>>>>>>>        queryAt: 'fields' putAll: 'displayName%2Cinstrument%2Cpip';
>>>>>>>>>        get ;
>>>>>>>>>        contents)
>>>>>>>>> 
>>>>>>>>> The string  'displayName%2Cinstrument%2Cpip'
>>>>>>>>> is being converted to  'displayName%252Cinstrument%252Cpip'
>>>>>>>>> which causes the request to fail.
>>>>>>>>> 
>>>>>>>>> The query needs to be
>>>>>>>>> fields=displayName%2Cinstrument%2Cpip
>>>>>>>>> 
>>>>>>>>> I have not found how to do this correctly.
>>>>>>>>> Any help greatly appreciated.
>>>>>>>>> 
>>>>>>>>> Thanks.
>>>>>>>>> 
>>>>>>>>> Jimmie
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> Maybe a silly thing, but since %2C = , ... Did you tried already to
>>>>>>>> make itself encode that? Like
>>>>>>>> znClient
>>>>>>>>         addPath: '/v1/instruments';
>>>>>>>>         queryAt: 'fields' putAll: 'displayName,instrument,pip';
>>>>>>>>         get ;
>>>>>>>>         contents)
>>>>>>>> 
>>>>>>>> I suspect it is using encoding internally, that is why % is also
>>>>>>>> encoded if you try to put it.
>>>>>>>> 
>>>>>>>> I hope that works
>>>>>>>> 
>>>>>>> Not silly and no need to suspect, but absolutely correct !
>>>>>>> 
>>>>>>> Sven
>>>>>>> 
>>>>>> My apologies for not having full disclosure.
>>>>>> 
>>>>>> Pharo 4, new image, freshly installed Zinc stable version.
>>>>>> Xubuntu 15.04
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> That is what I thought would happen and what I tried first. But it is 
>>>>>> not being encoded from what I can find.
>>>>>> 
>>>>>> Inspect this in a workspace/playground.
>>>>>> 
>>>>>> ZnClient new
>>>>>>       https;
>>>>>>       host: '
>>>>>> google.com
>>>>>> ';
>>>>>>       addPath: '/commaTest';
>>>>>>       queryAt: 'fields' put: 'displayName,instrument,pip';
>>>>>>       yourself
>>>>>> 
>>>>>> View the  request / requestLine / uri.  The commas are still present in 
>>>>>> the URI.
>>>>>> So I tried encoding myself and get the other error.
>>>>>> 
>>>>>> Of course Google won't understand this and in this snippet won't receive 
>>>>>> it.
>>>>>> 
>>>>>> And please let me know if I am doing something wrong.
>>>>>> 
>>>>>> Any help greatly appreciated.
>>>>>> 
>>>>>> Jimmie
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>> 
>> 
> 


Reply via email to