RFR: 8255244: HttpClient: Response headers contain incorrectly encoded Unicode characters

Daniel Fuchs Wed, 11 Nov 2020 08:52:57 -0800

The HTTP/1.1 Header Parser of the new HttpClient currently assumes that all 
headers (names and value) are US-ASCII and as a result mis-decode any byte 
whose value is > 127; For instance, 0x80 (128) gets decoded as a U+FF80 (65408) 
instead of being either rejected or decoded as U+0080.


Historically, HTTP has allowed field content with text in the ISO-8859-1 
charset.  The ISO-8859-1 charset is also supported by `HttpURLConnection`.

We could decide to reject responses whose headers contain non US-ASCII 
characters out of hand, but for compatibility reasons, it seems preferable to 
interpret and accept any byte > 127 in header values as an ISO-8859-1 (Latin 1) 
character.
For backward compatibility, this change proposes to update the HTTP/1.1 Header 
Parser to support ISO-8859-1 encoding.
The HTTP/1.1 Header Parser will now apply the same validation than is already 
applied by the HTTP/2 stack.

-------------

Commit messages:
 - 8255244: HttpClient: Response headers contain incorrectly encoded Unicode 
characters

Changes: https://git.openjdk.java.net/jdk/pull/1169/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1169&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8255244
  Stats: 561 lines in 6 files changed: 535 ins; 0 del; 26 mod
  Patch: https://git.openjdk.java.net/jdk/pull/1169.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/1169/head:pull/1169

PR: https://git.openjdk.java.net/jdk/pull/1169

RFR: 8255244: HttpClient: Response headers contain incorrectly encoded Unicode characters

Reply via email to