[ 
https://issues.apache.org/jira/browse/HTTPCLIENT-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882026#comment-17882026
 ] 

Xavier BOURGOUIN edited comment on HTTPCLIENT-2341 at 9/16/24 1:06 PM:
-----------------------------------------------------------------------

I believe such considerations are supersede by section 
[https://www.rfc-editor.org/rfc/rfc3986#section-2.2], and in particular:
{noformat}
 URIs that differ in the replacement of a reserved character with its
   corresponding percent-encoded octet are not equivalent.
{noformat}
"@" is a reserved character, thus its percent-encoded representation (%40) is 
_not_ equivalent ...

In practice, we see multiple S3 servers (Scality, AWS, ...) effectively *not* 
considering *foo%40bar.file* and *f...@bar.file* as equivalents when trying to 
access an object which name is "f...@bar.file" For example it rejects a 
presigned URL that would have replaced *foo%40bar.file* with *f...@bar.file* 
with HTTP 403 Forbidden, probably considering it is trying to access a 
different resource than the one it has presigned, which seems legitimate as per 
our understanding of RFC 3986.
[https://docs.aws.amazon.com/AmazonS3/latest/userguide/ShareObjectPreSignedURL.html]


was (Author: JIRAUSER307029):
I believe such considerations are supersede by section 
https://www.rfc-editor.org/rfc/rfc3986#section-2.2, and in particular: 

{noformat}
 URIs that differ in the replacement of a reserved character with its
   corresponding percent-encoded octet are not equivalent.
{noformat}
"@" is a reserved character, thus its percent-encoded representation (%40) is 
_not_ equivalent ...

In practice, we see multiple S3 servers (Scality, AWS, ...) effectively *not 
*considering *foo%40bar.file* and *f...@bar.file* as equivalents when trying to 
access an object which name is "f...@bar.file" For example it rejects a 
presigned URL that would have replaced *foo%40bar.file* with *f...@bar.file* 
with HTTP 403 Forbidden, probably considering it is trying to access a 
different resource than the one it has presigned, which seems legitimate as per 
our understanding of RFC 3986.
https://docs.aws.amazon.com/AmazonS3/latest/userguide/ShareObjectPreSignedURL.html


> DefaultRedirect strategy breaks reserved chars in URI path
> ----------------------------------------------------------
>
>                 Key: HTTPCLIENT-2341
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-2341
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>          Components: HttpClient (classic)
>    Affects Versions: 4.5.14
>         Environment: httpclient4 (4.5.14)
> Linux/Ubuntu 22.04
>            Reporter: Xavier BOURGOUIN
>            Priority: Major
>         Attachments: hc4normalize.tar.gz
>
>
> When an HTTP response has an URI in the Location header with percent-encoded 
> reserved chars (such as %40), these chars are replaced by their normalized 
> equivalent (which is "@" in the case of %40), which seems to contradict RFC 
> 3986 ([https://www.rfc-editor.org/rfc/rfc3986#section-2.2] ), at least in the 
> sense that for such reserved characters, their percent-encoded value doesn't 
> have the same semantic meaning and thus aren't to be interpreted as 
> equivalent.
> One of the impacts is that it breaks any server / API that redirect clients 
> to a S3 blob object (AWS S3 for instance) that would happen to contain a %40 
> in the URI path (ex: location: https://<endpoint>/<some blob 
> container>/foo%40bar.file)
> Disabling URI normalization as show below seems to workaround it:
> {code:java}
> new 
> HttpGet("http://service-that-redirects";).setConfig(RequestConfig.custom().setNormalizeUri(false).build())
>  {code}
> However I'm not sure that's satisfying, if, as we suspect above, it is just 
> always wrong to "normalize" those reserved characters (plus it is enabled by 
> default).
> Note that httpclient5 is fine (the percent-encoded %40 is preserved as it 
> should, and it seems there's no more toggle for the normalization behavior 
> anyways).
> Comparing httpclient 4.x vs 5.x, it seems the URI normalization utility isn't 
> the same, which might explain why httpclient5 has no issue: 
> https://github.com/apache/httpcomponents-client/blob/4.5.x/httpclient/src/main/java/org/apache/http/impl/client/DefaultRedirectStrategy.java#L163
> https://github.com/apache/httpcomponents-client/blob/5.3.x/httpclient5/src/main/java/org/apache/hc/client5/http/impl/DefaultRedirectStrategy.java#L116
> (org.apache.http.client.utils.URIUtils.normalize() for HC4, versus 
> java.net.URI.normalize for HC5)
>  
> This past ticket https://issues.apache.org/jira/browse/HTTPCLIENT-2271 was 
> discussing something very similar, except it was the other way around: some 
> reserved characters were replaced by their percent-encoded equivalent. 
> However in the the lengthy comment thread there, it seems a consensus was 
> finally reach that for such chars, their percent-encoded value aren't 
> equivalent to their original value and thus shouldn't be transformed. So I 
> believe that reasoning should be bijective, and should also apply to the case 
> reported here.
> I worked out a reproducer in the form of a little maven project that I'm 
> attaching to this ticket, inspired from the one of that other ticket, that 
> demo the issue for httpclient 4.5.14 (but probably all 4.x is the same), and 
> compares it with httpclient5 (5.3.1). It should run directly with _mvn 
> exec:java_ and hopefully the output and code content are clear enough to be 
> self-explanatory.
>  
> In essence what it does is :
>  * Start a dummy http server with two services: */foo* that redirect to 
> */foo%40bar* and one that listen on *foo@bar* and reply with HTTP 200.
>  * Test httpclient4 (along with some other clients to demonstrate the 
> differences in behavior) by sending some GET request toward */foo* and 
> observe if and how it follows the redirect toward {*}/foo@bar{*}, which thus 
> allows to observe whether *%40* was replaced by *@*
>  
> {code:java}
> // Dummy server
> public static void main(String[] args) throws IOException, 
> InterruptedException {
>         HttpServer server = HttpServer.create(new InetSocketAddress(8000), 0);
>         server.createContext("/foo", new RedirectHttpHandler());
>         server.createContext("/foo@bar", new SuccessHttpHandler());
>         server.setExecutor(null);
>         server.start();
>         server.stop(0);
>        
>        // [... test client requets]
> }
> public static class RedirectHttpHandler implements HttpHandler {
>         @Override
>         public void handle(HttpExchange t) throws IOException {
>             t.getResponseHeaders().add("Location", "/foo%40bar");
>             t.sendResponseHeaders(302, 0);
>             OutputStream os = t.getResponseBody();
>             os.close();
>         }
>     }    
>     
>     public static class SuccessHttpHandler implements HttpHandler {
>         @Override
>         public void handle(HttpExchange t) throws IOException {
>             System.out.println("[server] Received GET with URI: " + 
> t.getRequestURI().toString());
>             String response = "You followed the redirect!";
>             t.sendResponseHeaders(200, response.length());
>             OutputStream os = t.getResponseBody();
>             os.write(response.getBytes());
>             os.close();
>         }
>     }
> {code}
> And httpclient4 test like this:
> {code:java}
> CloseableHttpClient client = HttpClients.createDefault();
> HttpGet httpget = new HttpGet("http://127.0.0.1:8000/foo";);
> CloseableHttpResponse response = client.execute(httpget);
> if (response.getStatusLine().getStatusCode() == 302) {
>     System.out.println("-> Location header: " + 
> response.getFirstHeader("Location").getValue());
> } else if (response.getStatusLine().getStatusCode() == 200) {
>     System.out.println("-> Followed the redirect!");
> } else {
>     throw new RuntimeException("Unexpected response code: " + 
> response.getStatusLine().getStatusCode());
> }   
> {code}
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@hc.apache.org
For additional commands, e-mail: dev-h...@hc.apache.org

Reply via email to