Re: pdfbox 3.0.0 upgrade, flatten method causes garbage

Pados Attila Mon, 18 Sep 2023 11:25:55 -0700

Hi,
thanks for help!

I could finally try the 3.0.1-SNAPSHOT version, an hour ago. The main issue
with garbage text on the pdf/table's header line, was solved, this looks ok.

There are some other problems, all related to the flatten() method.
One is, that images added to a pdf form are converted to a PDPushButton
component on the AcroForm, and then they are subject to flatten.
Version  2.0.29 correctly removes them from the form, and adds the image to
the content stream.

Version 3.0.0, and also 3.0.1-SNAPSHOT produces a pdf with missing images.
I added a filtering, so these are filtered from flatten call:

private static List<PDField> filterPDPushButtonFields(PDAcroForm form) {
    return
form.getFields().stream().filter(checkBeingPdPushButton()).collect(Collectors.toList());
}
form.flatten(filterPDPushButtonFields(form), false);

this looks ok, regarding the image.
I re-tested these with 3.0.1-SNAPSHOT, and it has text distorted, letters
badly positioned:

[image: distorted-text.png]

I am using the following to load the template pdf:
   InputStream resourceAsStream = ...

PDDocument a1doc = Loader.loadPDF(new RandomAccessReadBuffer(
resourceAsStream), () -> new ScratchFile(MemoryUsageSetting.
setupTempFileOnly()));

Original code used tempfile only for buffering the pdf load, I tried to
imitate it. When the errors popped up, also tried to use
RandomAccessReadBuffer, and it had no effect on the problems, so I set back
using tempfile.

PDAcroForm form = a1doc.getDocumentCatalog().getAcroForm(null);
intentionally using the null parameter to prevent fixups running, so new
behavior does not change much.

On Sat, Sep 16, 2023 at 7:20 PM Tilman Hausherr <thaush...@t-online.de>
wrote:

> Hi,
>
> Andreas fixed it, please try a snapshot
>
> https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/3.0.1-SNAPSHOT/
>
> Tilman
>
> On 16.09.2023 05:05, Tilman Hausherr wrote:
> > This sounds similar to
> > https://issues.apache.org/jira/browse/PDFBOX-5666
> >
> > Tilman
> >
> > On 15.09.2023 23:40, Pados Attila wrote:
> >> Our application uses 2.0.29 version without any problem. I am trying
> >> to upgrade to 3.0.0.
> >>
> >> The application follows the pattern of loading in a template pdf,
> >> which was edited to have some fields, with title and default value.
> >> The program later updates the field's value, and then calles flatten()
> >> on the AcroForm, so the fields are transformed to a readable,
> >> positioned text.
> >>
> >> After the change to 3.0.0 and calling the flatten() method, the
> >> field's title/caption values which are created on the template, turn
> >> into garbage.
> >>
> >> One example is, when a table/grid element's header line contains a
> >> text that includes () signs.
> >> After the braces, the text becomes garbage.
> >>
> >> Images added on the template, as part of the form, were left from the
> >> output.
> >> I could work around this by filtering the acroform fields, and skip
> >> the PDPushButton type fields.
> >>
> >> I could check with the pdfdebugger, that the main font used in the
> >> template is lost in the output.
> >> Any help would be appreciated, I am basically stucked with this task,
> >> and probably there is a PdfBox bug in the background.
> >>
> >> I am expert with java, but not with pdfbox, the original working code
> >> was made by someone else in the team.
> >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> > For additional commands, e-mail: users-h...@pdfbox.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org
>
>

-- 
Attila Pados
Java developer
+36204432457

Re: pdfbox 3.0.0 upgrade, flatten method causes garbage

Reply via email to