[ https://issues.apache.org/jira/browse/HTTPCLIENT-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17801309#comment-17801309 ]
Oleg Kalnichevski commented on HTTPCLIENT-2159: ----------------------------------------------- [~michael-o] Can this issue be closed as fixed by [https://github.com/apache/httpcomponents-core/pull/375] ? Oleg > Invalid handling of charset content type parameter > -------------------------------------------------- > > Key: HTTPCLIENT-2159 > URL: https://issues.apache.org/jira/browse/HTTPCLIENT-2159 > Project: HttpComponents HttpClient > Issue Type: Bug > Reporter: Michael Osipov > Priority: Major > > Based on [~reschke]'s, > [comment|https://issues.apache.org/jira/browse/HTTPCLIENT-2144?focusedCommentId=17310053&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17310053]. > We are treating several content types incorrectly. We have in > {{org.apache.hc.core5.http.ContentType}} several content types defined which > are per definition UTF-8 and do not contain any {{charset}} parameter or have > another form transport encoding. Affected are: > {code} > public static final ContentType APPLICATION_FORM_URLENCODED = create( > "application/x-www-form-urlencoded", StandardCharsets.ISO_8859_1); > public static final ContentType APPLICATION_JSON = create( > "application/json", StandardCharsets.UTF_8); > public static final ContentType APPLICATION_NDJSON = create( > "application/x-ndjson", StandardCharsets.UTF_8); > public static final ContentType APPLICATION_PDF = create( > "application/pdf", StandardCharsets.UTF_8); > public static final ContentType APPLICATION_PROBLEM_JSON = create( > "application/problem+json", StandardCharsets.UTF_8); > public static final ContentType MULTIPART_FORM_DATA = create( > "multipart/form-data", StandardCharsets.ISO_8859_1); > public static final ContentType MULTIPART_MIXED = create( > "multipart/mixed", StandardCharsets.ISO_8859_1); > public static final ContentType MULTIPART_RELATED = create( > "multipart/related", StandardCharsets.ISO_8859_1); > public static final ContentType TEXT_HTML = create( > "text/html", StandardCharsets.ISO_8859_1); > public static final ContentType TEXT_EVENT_STREAM = create( > "text/event-stream", StandardCharsets.UTF_8); > {code} > * {{application/x-www-form-urlencoded}}: Does not have a charset parameter: > https://www.iana.org/assignments/media-types/application/x-www-form-urlencoded. > HTML5 defines https://url.spec.whatwg.org/#urlencoded-serializing how to > apply alternative encoding, but UTF-8 is standard. > * {{application/json}}, {{application/x-ndjson}}, > {{application/problem+json}}: There is no charset definition because JSON is > *always* UTF-8. The charset paremeter has no meaning: > https://datatracker.ietf.org/doc/html/rfc8259#section-11 > * {{application/pdf}}: This is binary encoding, no charset > * {{text/event-stream}}: Defined *always* as UTF-8: > https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events-intro > * {{text/html}}: https://html.spec.whatwg.org/ does not define ISO-8859-1 to > be the default encoding. it says that encoding must be supplied by some means > and an algorithm is applied to find it. It seems that UTF-8 is expected these > days. > * {{multipart/mixed}}: Does not have a charset parameter, it is up to the > parts to supply proper encoding to perform byte-to-char conversion: > https://datatracker.ietf.org/doc/html/rfc2046 > * {{multipart/related}}: Does not have a charset parameter, it is up to the > parts to supply proper encoding to perform byte-to-char conversion: > https://datatracker.ietf.org/doc/html/rfc2387 > * {{multipart/form-data}}: Does not have a charset parameter, the RFC defines > a {{_charset_}} form field for that: > https://datatracker.ietf.org/doc/html/rfc7578#section-4.6 > {{charset}} applies to the transport layer only and never to the semantics of > the content-type. E.g., {{application/x-www-form-urlencoded}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@hc.apache.org For additional commands, e-mail: dev-h...@hc.apache.org