[ https://issues.apache.org/jira/browse/HTTPCLIENT-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17937115#comment-17937115 ]
ASF subversion and git services commented on HTTPCLIENT-2363: ------------------------------------------------------------- Commit d91eeb35ad70f08e72a92103f3d9ba0475d42f6c in httpcomponents-client's branch refs/heads/HTTPCLIENT-2363 from Oleg Kalnichevski [ https://gitbox.apache.org/repos/asf?p=httpcomponents-client.git;h=d91eeb35a ] HTTPCLIENT-2363: ensure requests have a scheme and an authority populated before they get committed to the execution pipeline > execute(HttpHost, HttpRequest, ResponseHandler) adds port to Host header > while execute(HttpRequest, ResponseHandler) does not > ----------------------------------------------------------------------------------------------------------------------------- > > Key: HTTPCLIENT-2363 > URL: https://issues.apache.org/jira/browse/HTTPCLIENT-2363 > Project: HttpComponents HttpClient > Issue Type: Bug > Components: HttpClient (classic) > Affects Versions: 5.3.1, 5.4.2 > Reporter: Nicholas O'Connor > Priority: Minor > Original Estimate: 168h > Remaining Estimate: 168h > > I've found what I think is a bug, but could also be expected behavior that's > surprising from the user's perspective. > [https://gist.github.com/Earth-Turtle/c39c5282af1c8a306099e89091fafea9] > Expected behavior: assume we have some URI > {{{}[https://www.example.com/some/path]{}}}. {{HttpClient}} provides > overloads for execute that allow the URI to be split into host and path > components("{{{}[https://www.example.com|https://www.example.com/]{}}}", > "{{{}/some/path{}}}"), or provided all in the same {{HttpRequest}} (where > {{{}request.getAuthority({}}}) is > "[{{https://example.com}}|https://example.com/]" and {{request.getUri()}} is > "/some/path"). Using either of these two methods provides the exact same > result. > > Actual behavior: {{execute(HttpHost, HttpRequest, ResponseHandler)}} sets the > Host header to be [{{www.example.com:443}}|http://www.example.com:443/], > while {{execute(HttpRequest, ResponseHandler)}} sets it to > [{{www.example.com}}|http://www.example.com/]. > > Normally, this behavior has no effect. In fact, > [https://echo.free.beeceptor.com|https://echo.free.beeceptor.com/] will strip > the port in the Host header when echoing back the headers in a request. > However, I've recently come across a server that rejected some requests with > "Invalid host header, this site must be accessed as > [https://www.example.com|https://www.example.com/]". Investigation revealed > that it rejected requests where the port was included in the Host header, and > would only accept requests where a port was not defined. > > This behavior is not defined by the HTTP spec; the port number is not > required in the Host header sent by the client, nor is the server obligated > to respect the host portion without the port. This case feels like an outlier > from usual behavior; however, this hidden behavior from {{HttpClient}} was > unexpected. > > It appears that this happens when {{{}ProtocolExec{}}}, > {{{}AsyncProtocolExec{}}}, and {{MinimalHttpClient}} are filling in the > authority and scheme for a request if it didn't have one to begin with. > Because they fill from the {{{}HttpRoute{}}}'s target {{{}HttpHost{}}}, this > host also contains port information (usually scheme-default) when it is set > as the request's authority. > > This bug is very easily worked around by simply setting the requests > authority from the target before calling execute, but it still seems unusual. > Was this behavior intended? -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@hc.apache.org For additional commands, e-mail: dev-h...@hc.apache.org