[ 
https://issues.apache.org/jira/browse/HTTPCLIENT-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17718569#comment-17718569
 ] 

Gary D. Gregory commented on HTTPCLIENT-2159:
---------------------------------------------

WRT ContentType it sounds like for some the charset is configurable and for 
others, the charset is fixed and not configurable. At least ContentType is 
almost immutable (the charset is immutable), see [HTTPCORE-745].

> Invalid handling of charset content type parameter
> --------------------------------------------------
>
>                 Key: HTTPCLIENT-2159
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-2159
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>            Reporter: Michael Osipov
>            Priority: Major
>
> Based on [~reschke]'s, 
> [comment|https://issues.apache.org/jira/browse/HTTPCLIENT-2144?focusedCommentId=17310053&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17310053].
>  We are treating several content types incorrectly. We have in 
> {{org.apache.hc.core5.http.ContentType}} several content types defined which 
> are per definition UTF-8 and do not contain any {{charset}} parameter or have 
> another form transport encoding. Affected are:
> {code}
>     public static final ContentType APPLICATION_FORM_URLENCODED = create(
>             "application/x-www-form-urlencoded", StandardCharsets.ISO_8859_1);
>     public static final ContentType APPLICATION_JSON = create(
>             "application/json", StandardCharsets.UTF_8);
>     public static final ContentType APPLICATION_NDJSON = create(
>             "application/x-ndjson", StandardCharsets.UTF_8);
>     public static final ContentType APPLICATION_PDF = create(
>             "application/pdf", StandardCharsets.UTF_8);
>     public static final ContentType APPLICATION_PROBLEM_JSON = create(
>             "application/problem+json", StandardCharsets.UTF_8);
>     public static final ContentType MULTIPART_FORM_DATA = create(
>             "multipart/form-data", StandardCharsets.ISO_8859_1);
>     public static final ContentType MULTIPART_MIXED = create(
>             "multipart/mixed", StandardCharsets.ISO_8859_1);
>     public static final ContentType MULTIPART_RELATED = create(
>             "multipart/related", StandardCharsets.ISO_8859_1);
>     public static final ContentType TEXT_HTML = create(
>             "text/html", StandardCharsets.ISO_8859_1);
>     public static final ContentType TEXT_EVENT_STREAM = create(
>             "text/event-stream", StandardCharsets.UTF_8);
> {code}
> * {{application/x-www-form-urlencoded}}: Does not have a charset parameter: 
> https://www.iana.org/assignments/media-types/application/x-www-form-urlencoded.
>  HTML5 defines https://url.spec.whatwg.org/#urlencoded-serializing how to 
> apply alternative encoding, but UTF-8 is standard.
> * {{application/json}}, {{application/x-ndjson}}, 
> {{application/problem+json}}: There is no charset definition because JSON is 
> *always* UTF-8. The charset paremeter has no meaning: 
> https://datatracker.ietf.org/doc/html/rfc8259#section-11
> * {{application/pdf}}: This is binary encoding, no charset
> * {{text/event-stream}}: Defined *always* as UTF-8: 
> https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events-intro
> * {{text/html}}: https://html.spec.whatwg.org/ does not define ISO-8859-1 to 
> be the default encoding. it says that encoding must be supplied by some means 
> and an algorithm is applied to find it. It seems that UTF-8 is expected these 
> days.
> * {{multipart/mixed}}: Does not have a charset parameter, it is up to the 
> parts to supply proper encoding to perform byte-to-char conversion: 
> https://datatracker.ietf.org/doc/html/rfc2046
> * {{multipart/related}}: Does not have a charset parameter, it is up to the 
> parts to supply proper encoding to perform byte-to-char conversion: 
> https://datatracker.ietf.org/doc/html/rfc2387
> * {{multipart/form-data}}: Does not have a charset parameter, the RFC defines 
> a {{_charset_}} form field for that: 
> https://datatracker.ietf.org/doc/html/rfc7578#section-4.6
> {{charset}} applies to the transport layer only and never to the semantics of 
> the content-type. E.g., {{application/x-www-form-urlencoded}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@hc.apache.org
For additional commands, e-mail: dev-h...@hc.apache.org

Reply via email to