> On 11 Jun 2015, at 08:35, Sven Van Caekenberghe <s...@stfx.eu> wrote: > > @everybody > > The key method that defines how the query part of a URL is percent encoded is > ZnMetaResourceUtils class>>#querySafeSet > > Years ago, Zinc HTTP Components followed the better safe than sorry approach > of encoding almost every character except for the ones that are safe in all > contexts. > > Later on, we began reading the specs better and decided to follow them more > closely, that is why there are now different safe sets. > > Now, we can (and should) all read the different specs, and try to learn from > things in the wild as well from other implementations. > > The quote from http://en.wikipedia.org/wiki/Query_string was incomplete, it > said 'for HTML 5 when submitting a form using GET', which is a very specific > context. > > ZnUrl was written against RFC 3986 mostly. > > Now, maybe we made a mistake, maybe not.
I looked into this a bit more, and I am confused. My most strict reading of RFC 3986 (which obsoletes RFC 2396) says in section 3.4 Query: query = *( pchar / "/" / "?" ) where pchar = unreserved / pct-encoded / sub-delims / ":" / "@" unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "=" which I understand to allow , The description above is what is in ZnMetaResourceUtils class>>#querySafeSet which the noted exceptions (=, & and + because we interpret the query as key-value pairs). In http://www.w3.org/Addressing/URL/uri-spec.html is read the same. That being said, there are counter examples, like when you search for foo,bar in Google using Google Chrome, which then results in the URL: https://www.google.be/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=foo%2Cbar Or when you do $ curl -G -v --data-urlencode "foo=one,two" "http://zn.stfx.eu/echo?q=1,2,3&x=a,b" * Hostname was NOT found in DNS cache * Trying 46.137.113.215... * Connected to zn.stfx.eu (46.137.113.215) port 80 (#0) > GET /echo?q=1,2,3&x=a,b&foo=one%2Ctwo HTTP/1.1 > User-Agent: curl/7.37.1 > Host: zn.stfx.eu > Accept: */* > < HTTP/1.1 200 OK < Date: Thu, 11 Jun 2015 14:23:06 GMT * Server Zinc HTTP Components 1.0 is not blacklisted < Server: Zinc HTTP Components 1.0 < Content-Type: text/plain;charset=utf-8 < Content-Length: 421 < Vary: Accept-Encoding < This is Zinc HTTP Components echoing your request ! Running a ZnManagingMultiThreadedServer(running 8083) GET request for /echo?q=1,2,3&x=a,b&foo=one,two with headers X-Forwarded-Server: ip-10-226-6-28.eu-west-1.compute.internal X-Forwarded-Host: zn.stfx.eu X-Zinc-Remote-Address: 127.0.0.1 User-Agent: curl/7.37.1 Host: localhost:8083 Connection: Keep-Alive Accept: */* X-Forwarded-For: 81.83.7.35 * Connection #0 to host zn.stfx.eu left intact Reading about JavaScripts' encodeURI and encodeURIComponent functions does not help either (the first one keeps the comma, the latter one encodes it). I know there are some other people on this list that might have an opinion, so let's try to figure this out together. > But maybe it also would be a good idea to allow users to decide this for > themselves on a case by case basis. > >> On 11 Jun 2015, at 05:18, Jimmie Houchin <jlhouc...@gmail.com> wrote: >> >> Thanks for the reply. >> >> I implemented Peter's suggestion as an easy keep moving solution. >> >> As I said, I am not expert in what is or is not legal according to the >> standards. >> However, looking at Python, their urllib library in the quote and urlencode >> methods they encode the commas by default. >> >> _ALWAYS_SAFE = frozenset(b'ABCDEFGHIJKLMNOPQRSTUVWXYZ' >> b'abcdefghijklmnopqrstuvwxyz' >> b'0123456789' >> b'_.-') >> >> https://docs.python.org/3/library/urllib.parse.html >> https://hg.python.org/cpython/file/3.4/Lib/urllib/parse.py >> >> That's at least how one major language understands the standard. And Python >> 2.7 is the same. >> >> According to Wikipedia >> http://en.wikipedia.org/wiki/Query_string >> • Characters that cannot be converted to the correct charset are replaced >> with HTML numeric character references[9] >> • SPACE is encoded as '+' >> • Letters (A–Z and a–z), numbers (0–9) and the characters '*','-','.' and >> '_' are left as-is >> >> It appeared in the stackoverflow article I quoted previously that ASP.NET >> encodes commas. I could misunderstand or be reading into it. >> http://stackoverflow.com/questions/8828702/why-is-the-comma-url-encoded >> Just a little more information to add to the discussion. >> >> Thanks. >> >> Jimmie >> >> >> >> >> On 06/10/2015 05:56 PM, Norbert Hartl wrote: >>> Just to clarify: >>> >>> " >>> Characters in the "reserved" set are not reserved in >>> all contexts. >>> >>> The set of characters actually reserved within any given URI >>> component is defined by that component. In general, a character is >>> reserved if the semantics of the URI changes if the character is >>> replaced with its escaped US-ASCII encoding." >>> >>> If I were you I'd subclass ZnUrl and implement >>> #encodeQuery:on: >>> on that class. You could have an extension method in ZnResourceMetaUtils >>> that returns the character set you need to have encoded. In ZnClient you >>> just set your ZnUrl derived class object as #url: >>> Cannot think of anything better for a quick resolve of your problem. >>> Norbert >>>> Am 11.06.2015 um 00:26 schrieb Jimmie Houchin <jlhouc...@gmail.com>: >>>> >>>> I am not an expert on URIs or encoding. However, this is a requirement of >>>> the API I am using and I am required to submit an encoded URI with %2C and >>>> no commas. >>>> >>>> As far as commas needing to be escaped, it seems from other sources that >>>> they should be. >>>> >>>> From https://www.ietf.org/rfc/rfc2396.txt >>>> The plus "+", dollar "$", and comma "," characters have been added to >>>> those in the "reserved" set, since they are treated as reserved >>>> within the query component. >>>> >>>> States that commas are reserved within the query component. >>>> >>>> >>>> http://stackoverflow.com/questions/8828702/why-is-the-comma-url-encoded >>>> >>>> >>>> Regardless of what is or is not required, I do need the ability to have a >>>> query string with commas encoded as %2C in order to satisfy and use the >>>> API which states. >>>> >>>> fields: Optional An URL encoded (%2C) comma separated list of instrument >>>> fields that are to be returned in the response. The instrument field will >>>> be returned regardless of the input to this query parameter. Please see >>>> the Response Parameters section below for a list of valid values. >>>> >>>> Which will look like this or something similar. >>>> >>>> fields=displayName%2Cinstrument%2Cpip >>>> >>>> >>>> Thanks. >>>> >>>> Jimmie >>>> >>>> >>>> On 06/10/2015 03:27 PM, Norbert Hartl wrote: >>>>> That's because the comma does not need to be escaped in the query part of >>>>> the uri. >>>>> >>>>> Norbert >>>>> >>>>> >>>>>> Am 10.06.2015 um 22:00 schrieb Jimmie Houchin <jlhouc...@gmail.com> >>>>>> : >>>>>> >>>>>> On 06/10/2015 10:32 AM, Sven Van Caekenberghe wrote: >>>>>> >>>>>>>> On 10 Jun 2015, at 17:24, David <stormb...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>> El Wed, 10 Jun 2015 10:14:37 -0500 >>>>>>>> Jimmie Houchin >>>>>>>> <jlhouc...@gmail.com> >>>>>>>> >>>>>>>> escribió: >>>>>>>> >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> I am attempting to use ZnClient to request data. The request requires >>>>>>>>> a %2C (comma) delimited string as part of the query. Below is a >>>>>>>>> snippet. >>>>>>>>> >>>>>>>>> znClient >>>>>>>>> addPath: '/v1/instruments'; >>>>>>>>> queryAt: 'fields' putAll: 'displayName%2Cinstrument%2Cpip'; >>>>>>>>> get ; >>>>>>>>> contents) >>>>>>>>> >>>>>>>>> The string 'displayName%2Cinstrument%2Cpip' >>>>>>>>> is being converted to 'displayName%252Cinstrument%252Cpip' >>>>>>>>> which causes the request to fail. >>>>>>>>> >>>>>>>>> The query needs to be >>>>>>>>> fields=displayName%2Cinstrument%2Cpip >>>>>>>>> >>>>>>>>> I have not found how to do this correctly. >>>>>>>>> Any help greatly appreciated. >>>>>>>>> >>>>>>>>> Thanks. >>>>>>>>> >>>>>>>>> Jimmie >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> Maybe a silly thing, but since %2C = , ... Did you tried already to >>>>>>>> make itself encode that? Like >>>>>>>> znClient >>>>>>>> addPath: '/v1/instruments'; >>>>>>>> queryAt: 'fields' putAll: 'displayName,instrument,pip'; >>>>>>>> get ; >>>>>>>> contents) >>>>>>>> >>>>>>>> I suspect it is using encoding internally, that is why % is also >>>>>>>> encoded if you try to put it. >>>>>>>> >>>>>>>> I hope that works >>>>>>>> >>>>>>> Not silly and no need to suspect, but absolutely correct ! >>>>>>> >>>>>>> Sven >>>>>>> >>>>>> My apologies for not having full disclosure. >>>>>> >>>>>> Pharo 4, new image, freshly installed Zinc stable version. >>>>>> Xubuntu 15.04 >>>>>> >>>>>> >>>>>> >>>>>> That is what I thought would happen and what I tried first. But it is >>>>>> not being encoded from what I can find. >>>>>> >>>>>> Inspect this in a workspace/playground. >>>>>> >>>>>> ZnClient new >>>>>> https; >>>>>> host: ' >>>>>> google.com >>>>>> '; >>>>>> addPath: '/commaTest'; >>>>>> queryAt: 'fields' put: 'displayName,instrument,pip'; >>>>>> yourself >>>>>> >>>>>> View the request / requestLine / uri. The commas are still present in >>>>>> the URI. >>>>>> So I tried encoding myself and get the other error. >>>>>> >>>>>> Of course Google won't understand this and in this snippet won't receive >>>>>> it. >>>>>> >>>>>> And please let me know if I am doing something wrong. >>>>>> >>>>>> Any help greatly appreciated. >>>>>> >>>>>> Jimmie >>>>>> >>>>>> >>>>>> >>>>>> >>>> >>> >> >