[ https://issues.apache.org/jira/browse/IMAGING-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17915747#comment-17915747 ]
Thomas Stieler commented on IMAGING-380: ---------------------------------------- My pull request is already merged, therefor I can close this issue! > Option to enforce UTF-8 encoding for IPTC records > ------------------------------------------------- > > Key: IMAGING-380 > URL: https://issues.apache.org/jira/browse/IMAGING-380 > Project: Commons Imaging > Issue Type: New Feature > Components: Format: JPEG > Affects Versions: 1.0.0-alpha5 > Reporter: Thomas Stieler > Priority: Major > > We are using commons-imaging to add IPTC records to JPEG images. > Currently the encoding for IPTC values is determined by testing, if all > values can be encoded with ISO-8859-1 (see > [here|https://github.com/apache/commons-imaging/blob/master/src/main/java/org/apache/commons/imaging/formats/jpeg/iptc/IptcParser.java#L375-L382]). > > * if ISO-8859-1 is used as charset, no envelope record "Coded Character Set" > is written > * if ISO-8859-1 cannot be used, then the IPTC values will be converted into > UTF-8 bytes and the envelope record "Coded Character Set" is set accordingly > The problem with this strategy is, that some applications/libraries are using > UTF-8 as default charset to parse IPTC records, if "Coded Character Set" is > not set. In this case special characters (like "äöü") in IPTC records are > parsed incorrectly. > Currently there is no option to enforce UTF-8 encoding: This would help to > improve the reliability for correct parsing of IPTC records written by > commons-imaging. > I already made some small changes to offer an option for enforcing UTF-8 > encoding without changes the current default behaviour, it would be cool if > someone could review my pull request: > https://github.com/apache/commons-imaging/pull/477 > Please send me a message, if something is missing or incomplete! -- This message was sent by Atlassian Jira (v8.20.10#820010)