[ 
https://issues.apache.org/jira/browse/HTTPCORE-778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17934440#comment-17934440
 ] 

Peter Halicky edited comment on HTTPCORE-778 at 3/12/25 7:44 AM:
-----------------------------------------------------------------

After reading more of RFC3986 I think this is still more on the bug side than a 
feature request.

Section 2.4 says:
{quote}This is when an implementation determines which of the reserved 
characters are to be used as subcomponent delimiters and which can be safely 
used as data.
{quote}
This is likely in contradiction with section 6 which wouldn't allow some 
characters to have a different meaning when unencoded (separator) and encoded 
(data). This is typically used in the query component of a URI, where 
individual parameters are separated by & characters, while possibly parameter 
values contain an encoded & character in some of the values or even parameter 
names.

This would still conform to section 6, if parameter separation and encoding was 
defined in the query component grammar. But the query component grammar is 
rather simple (section 3.4):
{quote}query = *( pchar / "/" / "?" )
{quote}
URIBuilder even has specific code that allows building of queries with 
unencoded separator (? and &) and possibly the same separator values encoded 
(parameter values).

With all of the above I'd say that section 6 needs to be taken with a grain of 
salt and can't be used to dismiss the need to use some characters both encoded 
and unencoded also in the fragment component. Note that the grammar of the 
fragment component is the same as that of the query component (section 3.5):
{quote}fragment = *( pchar / "/" / "?" )
{quote}
With all of the above, I suppose we change this ticket to a feature request, 
although the wording should also be somewhat altered as the discussion brought 
us to a slightly different place where we were before. I suppose it should be 
something like "allow setting encoded fragment".

I'll be happy to provide a PR. Please let me know if a simple setter for the 
encodedFragment will do or you'd like a different approach.


was (Author: JIRAUSER308992):
After reading more of RFC3986 I think this is still more on the bug side than a 
feature request.

Section 2.4 says:
{quote}This is when an implementation determines which of the reserved 
characters are to be used as subcomponent delimiters and which can be safely 
used as data.
{quote}
This is likely in contradiction with section 6 which wouldn't allow some 
characters to have a different meaning when unencoded (separator) and encoded 
(data). This is typically used in the query component of a URI, where 
individual parameters are separated by & characters, while possibly parameter 
values contain an encoded & character in some of the values or even parameter 
names.

This would still conform to section 6, if parameter separation and encoding was 
defined in the query component grammar. But the query component grammar is 
rather simple (section 3.4):
{quote}query = *( pchar / "/" / "?" )
{quote}
URIBuilder even has specific code that allows building of queries with 
unencoded separator (? and &) and possibly the same separator values encoded 
(parameter values).

With all of the above I'd say that section 6 needs to be taken with a grain of 
salt and can't be used to dismiss the need to use some characters both encoded 
and unencoded also in the fragment component. Note that the grammar of the 
fragment component is the same as that of the query component (section 3.5):
{quote}fragment = *( pchar / "/" / "?" )
{quote}

> URIBuilder uses incorrect encoding method for URI fragment
> ----------------------------------------------------------
>
>                 Key: HTTPCORE-778
>                 URL: https://issues.apache.org/jira/browse/HTTPCORE-778
>             Project: HttpComponents HttpCore
>          Issue Type: Bug
>          Components: HttpCore
>    Affects Versions: 5.3.3
>            Reporter: Peter Halicky
>            Priority: Major
>
> URI fragment is encoded in URIBuilder using:
> {code:java}
> PercentCodec.encode(sb, this.fragment, this.charset); {code}
> (line 401, end of buildString method)
> This encodes all characters except UNRESERVED using the percent-format.
> As per (obsoleted) RFC2396, URI fragment should use URIC safe-chars.
> As per RFC3986, quite a bit more characters should not be encoded:
> {code:java}
> pct-encoded   = "%" HEXDIG HEXDIG
> unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
> sub-delims    = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / 
> "="
> pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
> fragment    = *( pchar / "/" / "?" ) {code}
> Note that URIBuilder in httpclient 4.5.13 conforms to at least the old 
> RFC2396, as it uses URIC set of safe characters (i.e. this is in fact a 
> regression).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@hc.apache.org
For additional commands, e-mail: dev-h...@hc.apache.org

Reply via email to