[ 
https://issues.apache.org/jira/browse/HTTPCLIENT-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16756479#comment-16756479
 ] 

Jay Modi edited comment on HTTPCLIENT-1968 at 1/30/19 7:35 PM:
---------------------------------------------------------------

My apologies; the document I was looking at was actually not RFC 2396 but an 
updated and expired draft: 
[https://tools.ietf.org/id/draft-fielding-uri-rfc2396bis-07.txt]

 

In [section 2.4.2|https://tools.ietf.org/html/rfc2396#section-2.4.2] of the 
actual RFC 2396, the following is stated:
{quote}2.4.2. When to Escape and Unescape
 A URI is always in an "escaped" form, since escaping or unescaping a completed 
URI might change its semantics. Normally, the only time escape encodings can 
safely be made is when the URI is being created from its component parts; each 
component may have its own set of characters that are reserved, so only the 
mechanism responsible for generating or interpreting that component can 
determine whether or not escaping a character will change its semantics. 
Likewise, a URI must be separated into its components before the escaped 
characters within those components can be safely decoded.
{quote}
 

I understand that path normalization is reasonable, but path normalization 
should not change the resource referenced which this does. Ultimately this is 
still from a newer standard, but [RFC 3986 Section 
6.2.2.2|https://tools.ietf.org/html/rfc3986#section-6.2.2.2] states:

 
{quote}6.2.2.2. Percent-Encoding Normalization 
 The percent-encoding mechanism (Section 2.1) is a frequent source of variance 
among otherwise identical URIs. In addition to the case normalization issue 
noted above, some URI producers percent-encode octets that do not require 
percent-encoding, resulting in URIs that are equivalent to their non-encoded 
counterparts. These URIs should be normalized by decoding any percent-encoded 
octet that corresponds to an unreserved character, as described in Section 2.3.
{quote}
The key here is that a reserved character is being decoded, which changes the 
meaning of the URI. RFC 2396 doesn't provide these type of normalization 
standards but I do not see how decoding reserved characters that are encoded 
and changing the meaning of a URI is the right behavior.


was (Author: jaymode):
My apologies; the document I was looking at was actually not RFC 2396 but an 
[updated and expired 
draft|[https://tools.ietf.org/id/draft-fielding-uri-rfc2396bis-07.txt]]

 

In [section 2.4.2|https://tools.ietf.org/html/rfc2396#section-2.4.2] of the 
actual RFC 2396, the following is stated:
{quote}2.4.2. When to Escape and Unescape
 A URI is always in an "escaped" form, since escaping or unescaping a completed 
URI might change its semantics. Normally, the only time escape encodings can 
safely be made is when the URI is being created from its component parts; each 
component may have its own set of characters that are reserved, so only the 
mechanism responsible for generating or interpreting that component can 
determine whether or not escaping a character will change its semantics. 
Likewise, a URI must be separated into its components before the escaped 
characters within those components can be safely decoded.
{quote}
 

I understand that path normalization is reasonable, but path normalization 
should not change the resource referenced which this does. Ultimately this is 
still from a newer standard, but [RFC 3986 Section 
6.2.2.2|https://tools.ietf.org/html/rfc3986#section-6.2.2.2] states:

 
{quote}6.2.2.2. Percent-Encoding Normalization 
 The percent-encoding mechanism (Section 2.1) is a frequent source of variance 
among otherwise identical URIs. In addition to the case normalization issue 
noted above, some URI producers percent-encode octets that do not require 
percent-encoding, resulting in URIs that are equivalent to their non-encoded 
counterparts. These URIs should be normalized by decoding any percent-encoded 
octet that corresponds to an unreserved character, as described in Section 2.3.
{quote}
The key here is that a reserved character is being decoded, which changes the 
meaning of the URI. RFC 2396 doesn't provide these type of normalization 
standards but I do not see how decoding reserved characters that are encoded 
and changing the meaning of a URI is the right behavior.

> Encoded forward slashes are not preserved when rewriting URI
> ------------------------------------------------------------
>
>                 Key: HTTPCLIENT-1968
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-1968
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>    Affects Versions: 4.5.7
>            Reporter: Jay Modi
>            Priority: Major
>         Attachments: rewrite_preserve_forward_slash.diff
>
>
> URIs that contain an encoded forward slash (%2F) are no longer preserved when 
> the HTTP client executes. I came across this when upgrading from 4.5.2 to 
> 4.5.7 and my requests that contained an encoded forward slash suddenly 
> started failing. The appears to be due to decoding and re-encoding of the 
> path that takes place in the URIUtils#rewriteURI method. I've attached a 
> patch that restores the old behavior but if a URI contains two slashes in a 
> row in addition to an encoded slash the encoded forward slash will be decoded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@hc.apache.org
For additional commands, e-mail: dev-h...@hc.apache.org

Reply via email to