Non-latin language in FreeText Annotation problems

2024-12-12 Thread TOM TOM
Hi, Using Apache PDFBox 3.0.3 to create Free Text Annotation that contains text in Greek, we've come across a problem and would appreciate your help. 1. First we tried to use: annotation.setDefaultAppearance("/Helvetica 10 Tf 0 0 0 rg"); Calling annotation.constructAppearances(document); it th

Re: Non-latin language in FreeText Annotation problems

2024-12-12 Thread TOM TOM
Dear Tilman, Your solution works great. Thank you so much for solving my problem. Best wishes Στις Πέμ 12 Δεκ 2024 στις 1:19 μ.μ., ο/η Tilman Hausherr < thaush...@t-online.de> έγραψε: > Hi, > > You need to add the font to the default resources and don't subset it, > here is what I changed in th

Re: Non-latin language in FreeText Annotation problems

2024-12-12 Thread Tilman Hausherr
Hi, You need to add the font to the default resources and don't subset it, here is what I changed in the AddAnnotations example:     PDAcroForm acroForm = document.getDocumentCatalog().getAcroForm();     if (acroForm == null)     {     acroForm = new PDAcr

Text extraction adding lots of strange spaces

2024-12-12 Thread Kevin Day
Hello- We are using PDFTextStripper, and have found some cases where there are a *lot* of extraneous spaces being added to the output. It almost acts like the stripper is thinking that the space width of the font is super tiny. I managed to get a document that exhibits the behavior: https://dri

Re: Text extraction adding lots of strange spaces

2024-12-12 Thread Tilman Hausherr
Hi, These spaces are really in the PDF: BT /Content <>BDC 1 i /T1_4 1 Tf 7 0 0 7 *195.4 110.502* Tm *(\( \))Tj* EMC /Content <>BDC /T1_0 1 Tf -19.686 6.786 Td (Beginning capital account)Tj /T1_4 1 Tf ( )Tj