[ https://issues.apache.org/jira/browse/HTTPCLIENT-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882026#comment-17882026 ]
Xavier BOURGOUIN edited comment on HTTPCLIENT-2341 at 9/16/24 1:06 PM: ----------------------------------------------------------------------- I believe such considerations are supersede by section [https://www.rfc-editor.org/rfc/rfc3986#section-2.2], and in particular: {noformat} URIs that differ in the replacement of a reserved character with its corresponding percent-encoded octet are not equivalent. {noformat} "@" is a reserved character, thus its percent-encoded representation (%40) is _not_ equivalent ... In practice, we see multiple S3 servers (Scality, AWS, ...) effectively *not* considering *foo%40bar.file* and *f...@bar.file* as equivalents when trying to access an object which name is "f...@bar.file" For example it rejects a presigned URL that would have replaced *foo%40bar.file* with *f...@bar.file* with HTTP 403 Forbidden, probably considering it is trying to access a different resource than the one it has presigned, which seems legitimate as per our understanding of RFC 3986. [https://docs.aws.amazon.com/AmazonS3/latest/userguide/ShareObjectPreSignedURL.html] was (Author: JIRAUSER307029): I believe such considerations are supersede by section https://www.rfc-editor.org/rfc/rfc3986#section-2.2, and in particular: {noformat} URIs that differ in the replacement of a reserved character with its corresponding percent-encoded octet are not equivalent. {noformat} "@" is a reserved character, thus its percent-encoded representation (%40) is _not_ equivalent ... In practice, we see multiple S3 servers (Scality, AWS, ...) effectively *not *considering *foo%40bar.file* and *f...@bar.file* as equivalents when trying to access an object which name is "f...@bar.file" For example it rejects a presigned URL that would have replaced *foo%40bar.file* with *f...@bar.file* with HTTP 403 Forbidden, probably considering it is trying to access a different resource than the one it has presigned, which seems legitimate as per our understanding of RFC 3986. https://docs.aws.amazon.com/AmazonS3/latest/userguide/ShareObjectPreSignedURL.html > DefaultRedirect strategy breaks reserved chars in URI path > ---------------------------------------------------------- > > Key: HTTPCLIENT-2341 > URL: https://issues.apache.org/jira/browse/HTTPCLIENT-2341 > Project: HttpComponents HttpClient > Issue Type: Bug > Components: HttpClient (classic) > Affects Versions: 4.5.14 > Environment: httpclient4 (4.5.14) > Linux/Ubuntu 22.04 > Reporter: Xavier BOURGOUIN > Priority: Major > Attachments: hc4normalize.tar.gz > > > When an HTTP response has an URI in the Location header with percent-encoded > reserved chars (such as %40), these chars are replaced by their normalized > equivalent (which is "@" in the case of %40), which seems to contradict RFC > 3986 ([https://www.rfc-editor.org/rfc/rfc3986#section-2.2] ), at least in the > sense that for such reserved characters, their percent-encoded value doesn't > have the same semantic meaning and thus aren't to be interpreted as > equivalent. > One of the impacts is that it breaks any server / API that redirect clients > to a S3 blob object (AWS S3 for instance) that would happen to contain a %40 > in the URI path (ex: location: https://<endpoint>/<some blob > container>/foo%40bar.file) > Disabling URI normalization as show below seems to workaround it: > {code:java} > new > HttpGet("http://service-that-redirects").setConfig(RequestConfig.custom().setNormalizeUri(false).build()) > {code} > However I'm not sure that's satisfying, if, as we suspect above, it is just > always wrong to "normalize" those reserved characters (plus it is enabled by > default). > Note that httpclient5 is fine (the percent-encoded %40 is preserved as it > should, and it seems there's no more toggle for the normalization behavior > anyways). > Comparing httpclient 4.x vs 5.x, it seems the URI normalization utility isn't > the same, which might explain why httpclient5 has no issue: > https://github.com/apache/httpcomponents-client/blob/4.5.x/httpclient/src/main/java/org/apache/http/impl/client/DefaultRedirectStrategy.java#L163 > https://github.com/apache/httpcomponents-client/blob/5.3.x/httpclient5/src/main/java/org/apache/hc/client5/http/impl/DefaultRedirectStrategy.java#L116 > (org.apache.http.client.utils.URIUtils.normalize() for HC4, versus > java.net.URI.normalize for HC5) > > This past ticket https://issues.apache.org/jira/browse/HTTPCLIENT-2271 was > discussing something very similar, except it was the other way around: some > reserved characters were replaced by their percent-encoded equivalent. > However in the the lengthy comment thread there, it seems a consensus was > finally reach that for such chars, their percent-encoded value aren't > equivalent to their original value and thus shouldn't be transformed. So I > believe that reasoning should be bijective, and should also apply to the case > reported here. > I worked out a reproducer in the form of a little maven project that I'm > attaching to this ticket, inspired from the one of that other ticket, that > demo the issue for httpclient 4.5.14 (but probably all 4.x is the same), and > compares it with httpclient5 (5.3.1). It should run directly with _mvn > exec:java_ and hopefully the output and code content are clear enough to be > self-explanatory. > > In essence what it does is : > * Start a dummy http server with two services: */foo* that redirect to > */foo%40bar* and one that listen on *foo@bar* and reply with HTTP 200. > * Test httpclient4 (along with some other clients to demonstrate the > differences in behavior) by sending some GET request toward */foo* and > observe if and how it follows the redirect toward {*}/foo@bar{*}, which thus > allows to observe whether *%40* was replaced by *@* > > {code:java} > // Dummy server > public static void main(String[] args) throws IOException, > InterruptedException { > HttpServer server = HttpServer.create(new InetSocketAddress(8000), 0); > server.createContext("/foo", new RedirectHttpHandler()); > server.createContext("/foo@bar", new SuccessHttpHandler()); > server.setExecutor(null); > server.start(); > server.stop(0); > > // [... test client requets] > } > public static class RedirectHttpHandler implements HttpHandler { > @Override > public void handle(HttpExchange t) throws IOException { > t.getResponseHeaders().add("Location", "/foo%40bar"); > t.sendResponseHeaders(302, 0); > OutputStream os = t.getResponseBody(); > os.close(); > } > } > > public static class SuccessHttpHandler implements HttpHandler { > @Override > public void handle(HttpExchange t) throws IOException { > System.out.println("[server] Received GET with URI: " + > t.getRequestURI().toString()); > String response = "You followed the redirect!"; > t.sendResponseHeaders(200, response.length()); > OutputStream os = t.getResponseBody(); > os.write(response.getBytes()); > os.close(); > } > } > {code} > And httpclient4 test like this: > {code:java} > CloseableHttpClient client = HttpClients.createDefault(); > HttpGet httpget = new HttpGet("http://127.0.0.1:8000/foo"); > CloseableHttpResponse response = client.execute(httpget); > if (response.getStatusLine().getStatusCode() == 302) { > System.out.println("-> Location header: " + > response.getFirstHeader("Location").getValue()); > } else if (response.getStatusLine().getStatusCode() == 200) { > System.out.println("-> Followed the redirect!"); > } else { > throw new RuntimeException("Unexpected response code: " + > response.getStatusLine().getStatusCode()); > } > {code} > > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@hc.apache.org For additional commands, e-mail: dev-h...@hc.apache.org