[ https://issues.apache.org/jira/browse/HTTPCLIENT-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16756479#comment-16756479 ]
Jay Modi edited comment on HTTPCLIENT-1968 at 1/30/19 7:35 PM: --------------------------------------------------------------- My apologies; the document I was looking at was actually not RFC 2396 but an updated and expired draft: [https://tools.ietf.org/id/draft-fielding-uri-rfc2396bis-07.txt] In [section 2.4.2|https://tools.ietf.org/html/rfc2396#section-2.4.2] of the actual RFC 2396, the following is stated: {quote}2.4.2. When to Escape and Unescape A URI is always in an "escaped" form, since escaping or unescaping a completed URI might change its semantics. Normally, the only time escape encodings can safely be made is when the URI is being created from its component parts; each component may have its own set of characters that are reserved, so only the mechanism responsible for generating or interpreting that component can determine whether or not escaping a character will change its semantics. Likewise, a URI must be separated into its components before the escaped characters within those components can be safely decoded. {quote} I understand that path normalization is reasonable, but path normalization should not change the resource referenced which this does. Ultimately this is still from a newer standard, but [RFC 3986 Section 6.2.2.2|https://tools.ietf.org/html/rfc3986#section-6.2.2.2] states: {quote}6.2.2.2. Percent-Encoding Normalization The percent-encoding mechanism (Section 2.1) is a frequent source of variance among otherwise identical URIs. In addition to the case normalization issue noted above, some URI producers percent-encode octets that do not require percent-encoding, resulting in URIs that are equivalent to their non-encoded counterparts. These URIs should be normalized by decoding any percent-encoded octet that corresponds to an unreserved character, as described in Section 2.3. {quote} The key here is that a reserved character is being decoded, which changes the meaning of the URI. RFC 2396 doesn't provide these type of normalization standards but I do not see how decoding reserved characters that are encoded and changing the meaning of a URI is the right behavior. was (Author: jaymode): My apologies; the document I was looking at was actually not RFC 2396 but an [updated and expired draft|[https://tools.ietf.org/id/draft-fielding-uri-rfc2396bis-07.txt]] In [section 2.4.2|https://tools.ietf.org/html/rfc2396#section-2.4.2] of the actual RFC 2396, the following is stated: {quote}2.4.2. When to Escape and Unescape A URI is always in an "escaped" form, since escaping or unescaping a completed URI might change its semantics. Normally, the only time escape encodings can safely be made is when the URI is being created from its component parts; each component may have its own set of characters that are reserved, so only the mechanism responsible for generating or interpreting that component can determine whether or not escaping a character will change its semantics. Likewise, a URI must be separated into its components before the escaped characters within those components can be safely decoded. {quote} I understand that path normalization is reasonable, but path normalization should not change the resource referenced which this does. Ultimately this is still from a newer standard, but [RFC 3986 Section 6.2.2.2|https://tools.ietf.org/html/rfc3986#section-6.2.2.2] states: {quote}6.2.2.2. Percent-Encoding Normalization The percent-encoding mechanism (Section 2.1) is a frequent source of variance among otherwise identical URIs. In addition to the case normalization issue noted above, some URI producers percent-encode octets that do not require percent-encoding, resulting in URIs that are equivalent to their non-encoded counterparts. These URIs should be normalized by decoding any percent-encoded octet that corresponds to an unreserved character, as described in Section 2.3. {quote} The key here is that a reserved character is being decoded, which changes the meaning of the URI. RFC 2396 doesn't provide these type of normalization standards but I do not see how decoding reserved characters that are encoded and changing the meaning of a URI is the right behavior. > Encoded forward slashes are not preserved when rewriting URI > ------------------------------------------------------------ > > Key: HTTPCLIENT-1968 > URL: https://issues.apache.org/jira/browse/HTTPCLIENT-1968 > Project: HttpComponents HttpClient > Issue Type: Bug > Affects Versions: 4.5.7 > Reporter: Jay Modi > Priority: Major > Attachments: rewrite_preserve_forward_slash.diff > > > URIs that contain an encoded forward slash (%2F) are no longer preserved when > the HTTP client executes. I came across this when upgrading from 4.5.2 to > 4.5.7 and my requests that contained an encoded forward slash suddenly > started failing. The appears to be due to decoding and re-encoding of the > path that takes place in the URIUtils#rewriteURI method. I've attached a > patch that restores the old behavior but if a URI contains two slashes in a > row in addition to an encoded slash the encoded forward slash will be decoded. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@hc.apache.org For additional commands, e-mail: dev-h...@hc.apache.org