[ 
https://issues.apache.org/jira/browse/HTTPCLIENT-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17937116#comment-17937116
 ] 

Oleg Kalnichevski commented on HTTPCLIENT-2363:
-----------------------------------------------

[~abernal] What Httpclient does is not wrong or non-conformant with the 
specification, it is inconsistent. We should fix the inconsistency.

[~earthturtle] [~abernal] Please review the proposed fix:

[https://github.com/apache/httpcomponents-client/compare/5.4.x...HTTPCLIENT-2363]

 

Oleg

> execute(HttpHost, HttpRequest, ResponseHandler) adds port to Host header 
> while execute(HttpRequest, ResponseHandler) does not
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HTTPCLIENT-2363
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-2363
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>          Components: HttpClient (classic)
>    Affects Versions: 5.3.1, 5.4.2
>            Reporter: Nicholas O'Connor
>            Priority: Minor
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I've found what I think is a bug, but could also be expected behavior that's 
> surprising from the user's perspective.
> [https://gist.github.com/Earth-Turtle/c39c5282af1c8a306099e89091fafea9]
> Expected behavior: assume we have some URI 
> {{{}[https://www.example.com/some/path]{}}}. {{HttpClient}} provides 
> overloads for execute that allow the URI to be split into host and path 
> components("{{{}[https://www.example.com|https://www.example.com/]{}}}";, 
> "{{{}/some/path{}}}"), or provided all in the same {{HttpRequest}} (where 
> {{{}request.getAuthority({}}}) is 
> "[{{https://example.com}}|https://example.com/]"; and {{request.getUri()}} is 
> "/some/path"). Using either of these two methods provides the exact same 
> result.
>  
> Actual behavior: {{execute(HttpHost, HttpRequest, ResponseHandler)}} sets the 
> Host header to be [{{www.example.com:443}}|http://www.example.com:443/], 
> while {{execute(HttpRequest, ResponseHandler)}} sets it to 
> [{{www.example.com}}|http://www.example.com/].
>  
> Normally, this behavior has no effect. In fact, 
> [https://echo.free.beeceptor.com|https://echo.free.beeceptor.com/] will strip 
> the port in the Host header when echoing back the headers in a request. 
> However, I've recently come across a server that rejected some requests with 
> "Invalid host header, this site must be accessed as 
> [https://www.example.com|https://www.example.com/]";. Investigation revealed 
> that it rejected requests where the port was included in the Host header, and 
> would only accept requests where a port was not defined.
>  
> This behavior is not defined by the HTTP spec; the port number is not 
> required in the Host header sent by the client, nor is the server obligated 
> to respect the host portion without the port. This case feels like an outlier 
> from usual behavior; however, this hidden behavior from {{HttpClient}} was 
> unexpected.
>  
> It appears that this happens when {{{}ProtocolExec{}}}, 
> {{{}AsyncProtocolExec{}}}, and {{MinimalHttpClient}} are filling in the 
> authority and scheme for a request if it didn't have one to begin with. 
> Because they fill from the {{{}HttpRoute{}}}'s target {{{}HttpHost{}}}, this 
> host also contains port information (usually scheme-default) when it is set 
> as the request's authority.
>  
> This bug is very easily worked around by simply setting the requests 
> authority from the target before calling execute, but it still seems unusual. 
> Was this behavior intended?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@hc.apache.org
For additional commands, e-mail: dev-h...@hc.apache.org

Reply via email to