[ https://issues.apache.org/jira/browse/HTTPCORE-778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17934440#comment-17934440 ]
Peter Halicky edited comment on HTTPCORE-778 at 3/12/25 7:44 AM: ----------------------------------------------------------------- After reading more of RFC3986 I think this is still more on the bug side than a feature request. Section 2.4 says: {quote}This is when an implementation determines which of the reserved characters are to be used as subcomponent delimiters and which can be safely used as data. {quote} This is likely in contradiction with section 6 which wouldn't allow some characters to have a different meaning when unencoded (separator) and encoded (data). This is typically used in the query component of a URI, where individual parameters are separated by & characters, while possibly parameter values contain an encoded & character in some of the values or even parameter names. This would still conform to section 6, if parameter separation and encoding was defined in the query component grammar. But the query component grammar is rather simple (section 3.4): {quote}query = *( pchar / "/" / "?" ) {quote} URIBuilder even has specific code that allows building of queries with unencoded separator (? and &) and possibly the same separator values encoded (parameter values). With all of the above I'd say that section 6 needs to be taken with a grain of salt and can't be used to dismiss the need to use some characters both encoded and unencoded also in the fragment component. Note that the grammar of the fragment component is the same as that of the query component (section 3.5): {quote}fragment = *( pchar / "/" / "?" ) {quote} With all of the above, I suppose we change this ticket to a feature request, although the wording should also be somewhat altered as the discussion brought us to a slightly different place where we were before. I suppose it should be something like "allow setting encoded fragment". I'll be happy to provide a PR. Please let me know if a simple setter for the encodedFragment will do or you'd like a different approach. was (Author: JIRAUSER308992): After reading more of RFC3986 I think this is still more on the bug side than a feature request. Section 2.4 says: {quote}This is when an implementation determines which of the reserved characters are to be used as subcomponent delimiters and which can be safely used as data. {quote} This is likely in contradiction with section 6 which wouldn't allow some characters to have a different meaning when unencoded (separator) and encoded (data). This is typically used in the query component of a URI, where individual parameters are separated by & characters, while possibly parameter values contain an encoded & character in some of the values or even parameter names. This would still conform to section 6, if parameter separation and encoding was defined in the query component grammar. But the query component grammar is rather simple (section 3.4): {quote}query = *( pchar / "/" / "?" ) {quote} URIBuilder even has specific code that allows building of queries with unencoded separator (? and &) and possibly the same separator values encoded (parameter values). With all of the above I'd say that section 6 needs to be taken with a grain of salt and can't be used to dismiss the need to use some characters both encoded and unencoded also in the fragment component. Note that the grammar of the fragment component is the same as that of the query component (section 3.5): {quote}fragment = *( pchar / "/" / "?" ) {quote} > URIBuilder uses incorrect encoding method for URI fragment > ---------------------------------------------------------- > > Key: HTTPCORE-778 > URL: https://issues.apache.org/jira/browse/HTTPCORE-778 > Project: HttpComponents HttpCore > Issue Type: Bug > Components: HttpCore > Affects Versions: 5.3.3 > Reporter: Peter Halicky > Priority: Major > > URI fragment is encoded in URIBuilder using: > {code:java} > PercentCodec.encode(sb, this.fragment, this.charset); {code} > (line 401, end of buildString method) > This encodes all characters except UNRESERVED using the percent-format. > As per (obsoleted) RFC2396, URI fragment should use URIC safe-chars. > As per RFC3986, quite a bit more characters should not be encoded: > {code:java} > pct-encoded = "%" HEXDIG HEXDIG > unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" > sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / > "=" > pchar = unreserved / pct-encoded / sub-delims / ":" / "@" > fragment = *( pchar / "/" / "?" ) {code} > Note that URIBuilder in httpclient 4.5.13 conforms to at least the old > RFC2396, as it uses URIC set of safe characters (i.e. this is in fact a > regression). -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@hc.apache.org For additional commands, e-mail: dev-h...@hc.apache.org