[ 
https://issues.apache.org/jira/browse/PDFBOX-6178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18068194#comment-18068194
 ] 

Andreas Lehmkühler commented on PDFBOX-6178:
--------------------------------------------

Sorry, I'm a little bit late to the party. IMHO everything looks good so far. 
But I'm wondering why COSName#getName still tries to convert the bytes to a 
"real" string without those escaped special characters. Either we change that 
or add another method to deliver the origin representation. Right now the 
debugger doesn't show the origin value but the converted one, so that we have 
to use some other tool to detect/inspect such cases.

> PdfBox renames RadioButton with Umlaut
> --------------------------------------
>
>                 Key: PDFBOX-6178
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-6178
>             Project: PDFBox
>          Issue Type: Bug
>          Components: AcroForm
>    Affects Versions: 2.0.36, 3.0.5 PDFBox, 3.0.6 PDFBox, 3.0.7 PDFBox
>            Reporter: Maruan Sahyoun
>            Assignee: Maruan Sahyoun
>            Priority: Major
>             Fix For: 2.0.37, 3.0.8 PDFBox, 4.0.0
>
>         Attachments: form_empty.pdf, form_selected_ASCII_NUL_acrobat.pdf, 
> form_selected_acrobat_pro.pdf, form_selected_pdfbox.pdf, 
> form_selected_pdfbox_patched.pdf
>
>
> From the users mailing list:
> 1. Create a document that contains a radio button with Umlaut in name. I can 
> give you an example document.
> Let's say: A radio group "Geschlecht" with the buttons "männlich" and 
> "weiblich".
> Do not use PdfBox for this step. I used Acrobat Pro 2020.
> The name/value of the "männlich" button is encoded as "/m#e4nnlich" in the 
> PDF.
> 2. Update the value of the radio group with PdfBox to "männlich" and save it 
> to a new document.
> {code}
> import java.io.File;
> import org.apache.pdfbox.Loader;
> import org.apache.pdfbox.pdmodel.PDDocument;
> public class UpdateRadioGroup {
> private static final String INPUT_FILE = "form_empty.pdf";
> private static final String OUTPUT_FILE = "form_selected.pdf";
> private static final String FIELD_NAME = "Geschlecht";
> private static final String FIELD_VALUE = "männlich";
> public static void main(String[] args) throws Exception {
>          try (PDDocument document = Loader.loadPDF(new File(INPUT_FILE))) {
>              document.getDocumentCatalog()
>                      .getAcroForm(null)
>                      .getField(FIELD_NAME)
>                      .setValue(FIELD_VALUE);
>              document.save(new File(OUTPUT_FILE));
>          }
>      }
>  }
> {code}
> 3. Validate the name/value of the "männlich" button in the new document in a 
> text editor. PdfBox encodes "männlich" to "/m#c3#a4nnlich" (see 
> COSName.writePDF() ).
> The Problem
>  ===============
>  PdfBox renames the radio button from "männlich" to "männlich".  Or
>  "/m#e4nnlich" to "/m#c3#a4nnlich" in PDF-format.
>  When you read the document again, PdfBox converts "#c3#a" to "ä" but
>  all other programs do not. I tested Acrobat Pro 2020, actual Acrobat
>  Reader, PDFXplorer from https://www.o2sol.com



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to