[ https://issues.apache.org/jira/browse/HTTPCLIENT-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xavier BOURGOUIN updated HTTPCLIENT-2341: ----------------------------------------- Description: When an HTTP response has an URI in the Location header with percent-encoded reserved chars (such as %40), these chars are replaced by their normalized equivalent (which is "@" in the case of %40), which seems to contradict RFC 3986 ([https://www.rfc-editor.org/rfc/rfc3986#section-2.2] ), at least in the sense that for such reserved characters, their percent-encoded value doesn't have the same semantic meaning and thus aren't to be interpreted as equivalent. One of the impacts is that it breaks any server / API that redirect clients to a S3 blob object (AWS S3 for instance) that would happen to contain a %40 in the URI path (ex: location: https://<endpoint>/<some blob container>/foo%40bar.file) Disabling URI normalization as show below seems to workaround it: {code:java} new HttpGet("http://service-that-redirects").setConfig(RequestConfig.custom().setNormalizeUri(false).build()) {code} However I'm not sure that's satisfying, if, as we suspect above, it is just always wrong to "normalize" those reserved characters (plus it is enabled by default). Note that httpclient5 is fine (the percent-encoded %40 is preserved as it should, and it seems there's no more toggle for the normalization behavior). This past ticket https://issues.apache.org/jira/browse/HTTPCLIENT-2271 was discussing something very similar, except it was the other way around: some reserved characters were replaced by their percent-encoded equivalent. However in the the lengthy comment thread there, it seems a consensus was finally reach that for such chars, their percent-encoded value aren't equivalent to their original value and thus shouldn't be transformed. So I believe that reasoning should be bijective, and should also apply to the case reported here. I worked out a reproducer in the form of a little maven project that I'm attaching to this ticket, inspired from the one of that other ticket, that demo the issue for httpclient 4.5.14 (but probably all 4.x is the same), and compares it with httpclient5 (5.3.1). It should run directly with `mvn exec:java`. In essence what it does is : * Start a dummy http server with two services: '{*}/foo{*}' that redirect to '{*}/foo%40bar{*}' and one that listen on '{*}foo@bar{*}' * Test httpclient4 (along with some other clients to demonstrate the differences in behavior) by sending some GET request toward '/foo' and observe if and how it follows the redirect toward 'foo@bar', which thus allows to observe whether *%40* was replaced by *@* {code:java} // Dummy server public static void main(String[] args) throws IOException, InterruptedException { HttpServer server = HttpServer.create(new InetSocketAddress(8000), 0); server.createContext("/foo", new RedirectHttpHandler()); server.createContext("/foo@bar", new SuccessHttpHandler()); server.setExecutor(null); server.start(); server.stop(0); // [... test client requets] } public static class RedirectHttpHandler implements HttpHandler { @Override public void handle(HttpExchange t) throws IOException { t.getResponseHeaders().add("Location", "/foo%40bar"); t.sendResponseHeaders(302, 0); OutputStream os = t.getResponseBody(); os.close(); } } public static class SuccessHttpHandler implements HttpHandler { @Override public void handle(HttpExchange t) throws IOException { System.out.println("[server] Received GET with URI: " + t.getRequestURI().toString()); String response = "You followed the redirect!"; t.sendResponseHeaders(200, response.length()); OutputStream os = t.getResponseBody(); os.write(response.getBytes()); os.close(); } } {code} And httpclient4 test like this: {code:java} Unable to find source-code formatter for language: java. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yamlCloseableHttpClient client = HttpClients.createDefault(); HttpGet httpget = new HttpGet("http://127.0.0.1:8000/foo"); CloseableHttpResponse response = client.execute(httpget); if (response.getStatusLine().getStatusCode() == 302) { System.out.println("-> Location header: " + response.getFirstHeader("Location").getValue()); } else if (response.getStatusLine().getStatusCode() == 200) { System.out.println("-> Followed the redirect!"); } else { throw new RuntimeException("Unexpected response code: " + response.getStatusLine().getStatusCode()); } {code} was: When an HTTP response has an URI in the Location header with percent-encoded reserved chars (such as %40), these chars are replaced by their normalized equivalent (which is "@" in the case of %40), which seems to contradict RFC 3986 ([https://www.rfc-editor.org/rfc/rfc3986#section-2.2] ), at least in the sense that for such reserved characters, their percent-encoded value doesn't have the same semantic meaning and thus aren't to be interpreted as equivalent. One of the impacts is that it breaks any server / API that redirect clients to a S3 blob object (AWS S3 for instance) that would happen to contain a %40 in the URI path (ex: location: https://<endpoint>/<some blob container>/foo%40bar.file) Disabling URI normalization as show below seems to workaround it: {code:java} new HttpGet("http://service-that-redirects").setConfig(RequestConfig.custom().setNormalizeUri(false).build()) {code} However I'm not sure that's satisfying, if, as we suspect above, it is just always wrong to "normalize" those reserved characters (plus it is enabled by default). Note that httpclient5 is fine (the percent-encoded %40 is preserved as it should, and it seems there's no more toggle for the normalization behavior). This past ticket https://issues.apache.org/jira/browse/HTTPCLIENT-2271 was discussing something very similar, except it was the other way around: some reserved characters were replaced by their percent-encoded equivalent. However in the the lengthy comment thread there, it seems a consensus was finally reach that for such chars, their percent-encoded value aren't equivalent to their original value and thus shouldn't be transformed. So I believe if that reasoning should be bijective, and thus should also apply to the case reported here. I worked out a reproducer in the form of a little maven project that I'm attaching to this ticket, inspired from the one of that other ticket, that demo the issue for httpclient 4.5.14 (but probably all 4.x is the same), and compares it with httpclient5 (5.3.1). It should run directly with `mvn exec:java`. In essence what it does is : * Start a dummy http server with two services: '{*}/foo{*}' that redirect to '{*}/foo%40bar{*}' and one that listen on '{*}foo@bar{*}' * Test httpclient4 (along with some other clients to demonstrate the differences in behavior) by sending some GET request toward '/foo' and observe if and how it follows the redirect toward 'foo@bar', which thus allows to observe whether *%40* was replaced by *@* {code:java} // Dummy server public static void main(String[] args) throws IOException, InterruptedException { HttpServer server = HttpServer.create(new InetSocketAddress(8000), 0); server.createContext("/foo", new RedirectHttpHandler()); server.createContext("/foo@bar", new SuccessHttpHandler()); server.setExecutor(null); server.start(); server.stop(0); // [... test client requets] } public static class RedirectHttpHandler implements HttpHandler { @Override public void handle(HttpExchange t) throws IOException { t.getResponseHeaders().add("Location", "/foo%40bar"); t.sendResponseHeaders(302, 0); OutputStream os = t.getResponseBody(); os.close(); } } public static class SuccessHttpHandler implements HttpHandler { @Override public void handle(HttpExchange t) throws IOException { System.out.println("[server] Received GET with URI: " + t.getRequestURI().toString()); String response = "You followed the redirect!"; t.sendResponseHeaders(200, response.length()); OutputStream os = t.getResponseBody(); os.write(response.getBytes()); os.close(); } } {code} And httpclient4 test like this: {code:java} Unable to find source-code formatter for language: java. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yamlCloseableHttpClient client = HttpClients.createDefault(); HttpGet httpget = new HttpGet("http://127.0.0.1:8000/foo"); CloseableHttpResponse response = client.execute(httpget); if (response.getStatusLine().getStatusCode() == 302) { System.out.println("-> Location header: " + response.getFirstHeader("Location").getValue()); } else if (response.getStatusLine().getStatusCode() == 200) { System.out.println("-> Followed the redirect!"); } else { throw new RuntimeException("Unexpected response code: " + response.getStatusLine().getStatusCode()); } {code} > DefaultRedirect strategy breaks reserved chars in URI path > ---------------------------------------------------------- > > Key: HTTPCLIENT-2341 > URL: https://issues.apache.org/jira/browse/HTTPCLIENT-2341 > Project: HttpComponents HttpClient > Issue Type: Bug > Components: HttpClient (classic) > Affects Versions: 4.5.14 > Environment: httpclient4 (4.5.14) > Linux/Ubuntu 22.04 > Reporter: Xavier BOURGOUIN > Priority: Major > Attachments: hc4normalize.tar.gz > > > When an HTTP response has an URI in the Location header with percent-encoded > reserved chars (such as %40), these chars are replaced by their normalized > equivalent (which is "@" in the case of %40), which seems to contradict RFC > 3986 ([https://www.rfc-editor.org/rfc/rfc3986#section-2.2] ), at least in the > sense that for such reserved characters, their percent-encoded value doesn't > have the same semantic meaning and thus aren't to be interpreted as > equivalent. > One of the impacts is that it breaks any server / API that redirect clients > to a S3 blob object (AWS S3 for instance) that would happen to contain a %40 > in the URI path (ex: location: https://<endpoint>/<some blob > container>/foo%40bar.file) > Disabling URI normalization as show below seems to workaround it: > {code:java} > new > HttpGet("http://service-that-redirects").setConfig(RequestConfig.custom().setNormalizeUri(false).build()) > {code} > However I'm not sure that's satisfying, if, as we suspect above, it is just > always wrong to "normalize" those reserved characters (plus it is enabled by > default). > Note that httpclient5 is fine (the percent-encoded %40 is preserved as it > should, and it seems there's no more toggle for the normalization behavior). > > This past ticket https://issues.apache.org/jira/browse/HTTPCLIENT-2271 was > discussing something very similar, except it was the other way around: some > reserved characters were replaced by their percent-encoded equivalent. > However in the the lengthy comment thread there, it seems a consensus was > finally reach that for such chars, their percent-encoded value aren't > equivalent to their original value and thus shouldn't be transformed. So I > believe that reasoning should be bijective, and should also apply to the case > reported here. > I worked out a reproducer in the form of a little maven project that I'm > attaching to this ticket, inspired from the one of that other ticket, that > demo the issue for httpclient 4.5.14 (but probably all 4.x is the same), and > compares it with httpclient5 (5.3.1). It should run directly with `mvn > exec:java`. > > In essence what it does is : > * Start a dummy http server with two services: '{*}/foo{*}' that redirect to > '{*}/foo%40bar{*}' and one that listen on '{*}foo@bar{*}' > * Test httpclient4 (along with some other clients to demonstrate the > differences in behavior) by sending some GET request toward '/foo' and > observe if and how it follows the redirect toward 'foo@bar', which thus > allows to observe whether *%40* was replaced by *@* > > {code:java} > // Dummy server > public static void main(String[] args) throws IOException, > InterruptedException { > HttpServer server = HttpServer.create(new InetSocketAddress(8000), 0); > server.createContext("/foo", new RedirectHttpHandler()); > server.createContext("/foo@bar", new SuccessHttpHandler()); > server.setExecutor(null); > server.start(); > server.stop(0); > > // [... test client requets] > } > public static class RedirectHttpHandler implements HttpHandler { > @Override > public void handle(HttpExchange t) throws IOException { > t.getResponseHeaders().add("Location", "/foo%40bar"); > t.sendResponseHeaders(302, 0); > OutputStream os = t.getResponseBody(); > os.close(); > } > } > > public static class SuccessHttpHandler implements HttpHandler { > @Override > public void handle(HttpExchange t) throws IOException { > System.out.println("[server] Received GET with URI: " + > t.getRequestURI().toString()); > String response = "You followed the redirect!"; > t.sendResponseHeaders(200, response.length()); > OutputStream os = t.getResponseBody(); > os.write(response.getBytes()); > os.close(); > } > } > {code} > And httpclient4 test like this: > {code:java} > Unable to find source-code formatter for language: java. Available languages > are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, > groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, > perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, > yamlCloseableHttpClient client = HttpClients.createDefault(); > HttpGet httpget = new HttpGet("http://127.0.0.1:8000/foo"); > CloseableHttpResponse response = client.execute(httpget); > if (response.getStatusLine().getStatusCode() == 302) { > System.out.println("-> Location header: " + > response.getFirstHeader("Location").getValue()); > } else if (response.getStatusLine().getStatusCode() == 200) { > System.out.println("-> Followed the redirect!"); > } else { > throw new RuntimeException("Unexpected response code: " + > response.getStatusLine().getStatusCode()); > } > {code} > > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@hc.apache.org For additional commands, e-mail: dev-h...@hc.apache.org