[ https://issues.apache.org/jira/browse/HTTPCLIENT-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17937116#comment-17937116 ]
Oleg Kalnichevski commented on HTTPCLIENT-2363: ----------------------------------------------- [~abernal] What Httpclient does is not wrong or non-conformant with the specification, it is inconsistent. We should fix the inconsistency. [~earthturtle] [~abernal] Please review the proposed fix: [https://github.com/apache/httpcomponents-client/compare/5.4.x...HTTPCLIENT-2363] Oleg > execute(HttpHost, HttpRequest, ResponseHandler) adds port to Host header > while execute(HttpRequest, ResponseHandler) does not > ----------------------------------------------------------------------------------------------------------------------------- > > Key: HTTPCLIENT-2363 > URL: https://issues.apache.org/jira/browse/HTTPCLIENT-2363 > Project: HttpComponents HttpClient > Issue Type: Bug > Components: HttpClient (classic) > Affects Versions: 5.3.1, 5.4.2 > Reporter: Nicholas O'Connor > Priority: Minor > Original Estimate: 168h > Remaining Estimate: 168h > > I've found what I think is a bug, but could also be expected behavior that's > surprising from the user's perspective. > [https://gist.github.com/Earth-Turtle/c39c5282af1c8a306099e89091fafea9] > Expected behavior: assume we have some URI > {{{}[https://www.example.com/some/path]{}}}. {{HttpClient}} provides > overloads for execute that allow the URI to be split into host and path > components("{{{}[https://www.example.com|https://www.example.com/]{}}}", > "{{{}/some/path{}}}"), or provided all in the same {{HttpRequest}} (where > {{{}request.getAuthority({}}}) is > "[{{https://example.com}}|https://example.com/]" and {{request.getUri()}} is > "/some/path"). Using either of these two methods provides the exact same > result. > > Actual behavior: {{execute(HttpHost, HttpRequest, ResponseHandler)}} sets the > Host header to be [{{www.example.com:443}}|http://www.example.com:443/], > while {{execute(HttpRequest, ResponseHandler)}} sets it to > [{{www.example.com}}|http://www.example.com/]. > > Normally, this behavior has no effect. In fact, > [https://echo.free.beeceptor.com|https://echo.free.beeceptor.com/] will strip > the port in the Host header when echoing back the headers in a request. > However, I've recently come across a server that rejected some requests with > "Invalid host header, this site must be accessed as > [https://www.example.com|https://www.example.com/]". Investigation revealed > that it rejected requests where the port was included in the Host header, and > would only accept requests where a port was not defined. > > This behavior is not defined by the HTTP spec; the port number is not > required in the Host header sent by the client, nor is the server obligated > to respect the host portion without the port. This case feels like an outlier > from usual behavior; however, this hidden behavior from {{HttpClient}} was > unexpected. > > It appears that this happens when {{{}ProtocolExec{}}}, > {{{}AsyncProtocolExec{}}}, and {{MinimalHttpClient}} are filling in the > authority and scheme for a request if it didn't have one to begin with. > Because they fill from the {{{}HttpRoute{}}}'s target {{{}HttpHost{}}}, this > host also contains port information (usually scheme-default) when it is set > as the request's authority. > > This bug is very easily worked around by simply setting the requests > authority from the target before calling execute, but it still seems unusual. > Was this behavior intended? -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@hc.apache.org For additional commands, e-mail: dev-h...@hc.apache.org