[ https://issues.apache.org/jira/browse/HTTPCLIENT-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xavier BOURGOUIN updated HTTPCLIENT-2341: ----------------------------------------- Description: When an HTTP response has an URI in the Location header with percent-encoded reserved chars (such as %40), these chars are replaced by their normalized equivalent (which is "@" in the case of %40), which seems to contradict RFC 3986 ([https://www.rfc-editor.org/rfc/rfc3986#section-2.2] ), at least in the sense that for such reserved characters, their percent-encoded value doesn't have the same semantic meaning and thus aren't to be interpreted as equivalent. One of the impacts is that it breaks any server / API that redirect clients to a S3 blob object (AWS S3 for instance) that would happen to contain a %40 in the URI path (ex: location: https://<endpoint>/<some blob container>/foo%40bar.file) Disabling URI normalization as show below seems to workaround it: {code:java} new HttpGet("http://service-that-redirects").setConfig(RequestConfig.custom().setNormalizeUri(false).build()) {code} However I'm not sure that's satisfying, if, as we suspect above, it is just always wrong to "normalize" those reserved characters (plus it is enabled by default). Note that httpclient5 is fine (the percent-encoded %40 is preserved as it should, and it seems there's no more toggle for the normalization behavior anyways). Comparing httpclient 4.x vs 5.x, it seems the URI normalization utility isn't the same, which might explain why httpclient5 has no issue: https://github.com/apache/httpcomponents-client/blob/4.5.x/httpclient/src/main/java/org/apache/http/impl/client/DefaultRedirectStrategy.java#L163 https://github.com/apache/httpcomponents-client/blob/5.3.x/httpclient5/src/main/java/org/apache/hc/client5/http/impl/DefaultRedirectStrategy.java#L116 This past ticket https://issues.apache.org/jira/browse/HTTPCLIENT-2271 was discussing something very similar, except it was the other way around: some reserved characters were replaced by their percent-encoded equivalent. However in the the lengthy comment thread there, it seems a consensus was finally reach that for such chars, their percent-encoded value aren't equivalent to their original value and thus shouldn't be transformed. So I believe that reasoning should be bijective, and should also apply to the case reported here. I worked out a reproducer in the form of a little maven project that I'm attaching to this ticket, inspired from the one of that other ticket, that demo the issue for httpclient 4.5.14 (but probably all 4.x is the same), and compares it with httpclient5 (5.3.1). It should run directly with _mvn exec:java_ and hopefully the output and code content are clear enough to be self-explanatory. In essence what it does is : * Start a dummy http server with two services: */foo* that redirect to */foo%40bar* and one that listen on *foo@bar* and reply with HTTP 200. * Test httpclient4 (along with some other clients to demonstrate the differences in behavior) by sending some GET request toward */foo* and observe if and how it follows the redirect toward {*}/foo@bar{*}, which thus allows to observe whether *%40* was replaced by *@* {code:java} // Dummy server public static void main(String[] args) throws IOException, InterruptedException { HttpServer server = HttpServer.create(new InetSocketAddress(8000), 0); server.createContext("/foo", new RedirectHttpHandler()); server.createContext("/foo@bar", new SuccessHttpHandler()); server.setExecutor(null); server.start(); server.stop(0); // [... test client requets] } public static class RedirectHttpHandler implements HttpHandler { @Override public void handle(HttpExchange t) throws IOException { t.getResponseHeaders().add("Location", "/foo%40bar"); t.sendResponseHeaders(302, 0); OutputStream os = t.getResponseBody(); os.close(); } } public static class SuccessHttpHandler implements HttpHandler { @Override public void handle(HttpExchange t) throws IOException { System.out.println("[server] Received GET with URI: " + t.getRequestURI().toString()); String response = "You followed the redirect!"; t.sendResponseHeaders(200, response.length()); OutputStream os = t.getResponseBody(); os.write(response.getBytes()); os.close(); } } {code} And httpclient4 test like this: {code:java} CloseableHttpClient client = HttpClients.createDefault(); HttpGet httpget = new HttpGet("http://127.0.0.1:8000/foo"); CloseableHttpResponse response = client.execute(httpget); if (response.getStatusLine().getStatusCode() == 302) { System.out.println("-> Location header: " + response.getFirstHeader("Location").getValue()); } else if (response.getStatusLine().getStatusCode() == 200) { System.out.println("-> Followed the redirect!"); } else { throw new RuntimeException("Unexpected response code: " + response.getStatusLine().getStatusCode()); } {code} was: When an HTTP response has an URI in the Location header with percent-encoded reserved chars (such as %40), these chars are replaced by their normalized equivalent (which is "@" in the case of %40), which seems to contradict RFC 3986 ([https://www.rfc-editor.org/rfc/rfc3986#section-2.2] ), at least in the sense that for such reserved characters, their percent-encoded value doesn't have the same semantic meaning and thus aren't to be interpreted as equivalent. One of the impacts is that it breaks any server / API that redirect clients to a S3 blob object (AWS S3 for instance) that would happen to contain a %40 in the URI path (ex: location: https://<endpoint>/<some blob container>/foo%40bar.file) Disabling URI normalization as show below seems to workaround it: {code:java} new HttpGet("http://service-that-redirects").setConfig(RequestConfig.custom().setNormalizeUri(false).build()) {code} However I'm not sure that's satisfying, if, as we suspect above, it is just always wrong to "normalize" those reserved characters (plus it is enabled by default). Note that httpclient5 is fine (the percent-encoded %40 is preserved as it should, and it seems there's no more toggle for the normalization behavior anyways). This past ticket https://issues.apache.org/jira/browse/HTTPCLIENT-2271 was discussing something very similar, except it was the other way around: some reserved characters were replaced by their percent-encoded equivalent. However in the the lengthy comment thread there, it seems a consensus was finally reach that for such chars, their percent-encoded value aren't equivalent to their original value and thus shouldn't be transformed. So I believe that reasoning should be bijective, and should also apply to the case reported here. I worked out a reproducer in the form of a little maven project that I'm attaching to this ticket, inspired from the one of that other ticket, that demo the issue for httpclient 4.5.14 (but probably all 4.x is the same), and compares it with httpclient5 (5.3.1). It should run directly with _mvn exec:java_ and hopefully the output and code content are clear enough to be self-explanatory. In essence what it does is : * Start a dummy http server with two services: */foo* that redirect to */foo%40bar* and one that listen on *foo@bar* and reply with HTTP 200. * Test httpclient4 (along with some other clients to demonstrate the differences in behavior) by sending some GET request toward */foo* and observe if and how it follows the redirect toward {*}/foo@bar{*}, which thus allows to observe whether *%40* was replaced by *@* {code:java} // Dummy server public static void main(String[] args) throws IOException, InterruptedException { HttpServer server = HttpServer.create(new InetSocketAddress(8000), 0); server.createContext("/foo", new RedirectHttpHandler()); server.createContext("/foo@bar", new SuccessHttpHandler()); server.setExecutor(null); server.start(); server.stop(0); // [... test client requets] } public static class RedirectHttpHandler implements HttpHandler { @Override public void handle(HttpExchange t) throws IOException { t.getResponseHeaders().add("Location", "/foo%40bar"); t.sendResponseHeaders(302, 0); OutputStream os = t.getResponseBody(); os.close(); } } public static class SuccessHttpHandler implements HttpHandler { @Override public void handle(HttpExchange t) throws IOException { System.out.println("[server] Received GET with URI: " + t.getRequestURI().toString()); String response = "You followed the redirect!"; t.sendResponseHeaders(200, response.length()); OutputStream os = t.getResponseBody(); os.write(response.getBytes()); os.close(); } } {code} And httpclient4 test like this: {code:java} CloseableHttpClient client = HttpClients.createDefault(); HttpGet httpget = new HttpGet("http://127.0.0.1:8000/foo"); CloseableHttpResponse response = client.execute(httpget); if (response.getStatusLine().getStatusCode() == 302) { System.out.println("-> Location header: " + response.getFirstHeader("Location").getValue()); } else if (response.getStatusLine().getStatusCode() == 200) { System.out.println("-> Followed the redirect!"); } else { throw new RuntimeException("Unexpected response code: " + response.getStatusLine().getStatusCode()); } {code} > DefaultRedirect strategy breaks reserved chars in URI path > ---------------------------------------------------------- > > Key: HTTPCLIENT-2341 > URL: https://issues.apache.org/jira/browse/HTTPCLIENT-2341 > Project: HttpComponents HttpClient > Issue Type: Bug > Components: HttpClient (classic) > Affects Versions: 4.5.14 > Environment: httpclient4 (4.5.14) > Linux/Ubuntu 22.04 > Reporter: Xavier BOURGOUIN > Priority: Major > Attachments: hc4normalize.tar.gz > > > When an HTTP response has an URI in the Location header with percent-encoded > reserved chars (such as %40), these chars are replaced by their normalized > equivalent (which is "@" in the case of %40), which seems to contradict RFC > 3986 ([https://www.rfc-editor.org/rfc/rfc3986#section-2.2] ), at least in the > sense that for such reserved characters, their percent-encoded value doesn't > have the same semantic meaning and thus aren't to be interpreted as > equivalent. > One of the impacts is that it breaks any server / API that redirect clients > to a S3 blob object (AWS S3 for instance) that would happen to contain a %40 > in the URI path (ex: location: https://<endpoint>/<some blob > container>/foo%40bar.file) > Disabling URI normalization as show below seems to workaround it: > {code:java} > new > HttpGet("http://service-that-redirects").setConfig(RequestConfig.custom().setNormalizeUri(false).build()) > {code} > However I'm not sure that's satisfying, if, as we suspect above, it is just > always wrong to "normalize" those reserved characters (plus it is enabled by > default). > Note that httpclient5 is fine (the percent-encoded %40 is preserved as it > should, and it seems there's no more toggle for the normalization behavior > anyways). > Comparing httpclient 4.x vs 5.x, it seems the URI normalization utility isn't > the same, which might explain why httpclient5 has no issue: > https://github.com/apache/httpcomponents-client/blob/4.5.x/httpclient/src/main/java/org/apache/http/impl/client/DefaultRedirectStrategy.java#L163 > https://github.com/apache/httpcomponents-client/blob/5.3.x/httpclient5/src/main/java/org/apache/hc/client5/http/impl/DefaultRedirectStrategy.java#L116 > > This past ticket https://issues.apache.org/jira/browse/HTTPCLIENT-2271 was > discussing something very similar, except it was the other way around: some > reserved characters were replaced by their percent-encoded equivalent. > However in the the lengthy comment thread there, it seems a consensus was > finally reach that for such chars, their percent-encoded value aren't > equivalent to their original value and thus shouldn't be transformed. So I > believe that reasoning should be bijective, and should also apply to the case > reported here. > I worked out a reproducer in the form of a little maven project that I'm > attaching to this ticket, inspired from the one of that other ticket, that > demo the issue for httpclient 4.5.14 (but probably all 4.x is the same), and > compares it with httpclient5 (5.3.1). It should run directly with _mvn > exec:java_ and hopefully the output and code content are clear enough to be > self-explanatory. > > In essence what it does is : > * Start a dummy http server with two services: */foo* that redirect to > */foo%40bar* and one that listen on *foo@bar* and reply with HTTP 200. > * Test httpclient4 (along with some other clients to demonstrate the > differences in behavior) by sending some GET request toward */foo* and > observe if and how it follows the redirect toward {*}/foo@bar{*}, which thus > allows to observe whether *%40* was replaced by *@* > > {code:java} > // Dummy server > public static void main(String[] args) throws IOException, > InterruptedException { > HttpServer server = HttpServer.create(new InetSocketAddress(8000), 0); > server.createContext("/foo", new RedirectHttpHandler()); > server.createContext("/foo@bar", new SuccessHttpHandler()); > server.setExecutor(null); > server.start(); > server.stop(0); > > // [... test client requets] > } > public static class RedirectHttpHandler implements HttpHandler { > @Override > public void handle(HttpExchange t) throws IOException { > t.getResponseHeaders().add("Location", "/foo%40bar"); > t.sendResponseHeaders(302, 0); > OutputStream os = t.getResponseBody(); > os.close(); > } > } > > public static class SuccessHttpHandler implements HttpHandler { > @Override > public void handle(HttpExchange t) throws IOException { > System.out.println("[server] Received GET with URI: " + > t.getRequestURI().toString()); > String response = "You followed the redirect!"; > t.sendResponseHeaders(200, response.length()); > OutputStream os = t.getResponseBody(); > os.write(response.getBytes()); > os.close(); > } > } > {code} > And httpclient4 test like this: > {code:java} > CloseableHttpClient client = HttpClients.createDefault(); > HttpGet httpget = new HttpGet("http://127.0.0.1:8000/foo"); > CloseableHttpResponse response = client.execute(httpget); > if (response.getStatusLine().getStatusCode() == 302) { > System.out.println("-> Location header: " + > response.getFirstHeader("Location").getValue()); > } else if (response.getStatusLine().getStatusCode() == 200) { > System.out.println("-> Followed the redirect!"); > } else { > throw new RuntimeException("Unexpected response code: " + > response.getStatusLine().getStatusCode()); > } > {code} > > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@hc.apache.org For additional commands, e-mail: dev-h...@hc.apache.org