[jira] [Comment Edited] (PDFBOX-6032) Issues encountered while customizing "PDImageXObject"

Ilgoo Kim (Jira) Tue, 08 Jul 2025 07:02:33 -0700


    [ 
https://issues.apache.org/jira/browse/PDFBOX-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18003799#comment-18003799
 ]


Ilgoo Kim edited comment on PDFBOX-6032 at 7/8/25 2:01 PM:
-----------------------------------------------------------

I see - excessive exposure could be an issue.

Because our implementation handles different cases based on specific conditions 
rather than relying on a fallback, we found it necessary to customize the 
function accordingly.

In our current customization, we perform an empirical check after PNGConverter 
fails.
This logic determines whether converting the image using JPEGFactory would 
significantly alter its original appearance.

If this logic determines that the result from JPEGFactory would not differ 
significantly from the original image, the image is converted using 
JPEGFactory. Otherwise, if the difference is expected to be substantial, 
LosslessFactory is used instead. We aimed to improve performance by slightly 
increasing the use of JPEGFactory compared to the original implementation.

This is our current version of "createFromByteArray" function
{code:java}
public PDImageXObject createFromByteArray(PDDocument document, byte[] 
byteArray, String name) throws IOException {
    FileType fileType = FileTypeDetector.detectFileType(byteArray);
    if (fileType == null) {
       throw new IllegalArgumentException("Image type not supported: " + name);
    }
    if (fileType == FileType.JPEG) {
       return JPEGFactory.createFromByteArray(document, byteArray);
    }
    if (fileType == FileType.PNG) {
       // Try to directly convert the image without recoding it.
       PDImageXObject image = PNGConverter.convertPNGImage(document, byteArray);
       if (image != null) {
          return image;
       }
    }
    if (fileType == FileType.TIFF) {
       try {
          return CCITTFactory.createFromByteArray(document, byteArray);
       } catch (IOException ex) {
          log.debug("Reading as TIFF failed, setting fileType to PNG", ex);
          // Plan B: try reading with ImageIO
          // common exception:
          // First image in tiff is not CCITT T4 or T6 compressed
          fileType = FileType.PNG;
       }
    }
    if (fileType == FileType.BMP || fileType == FileType.GIF || fileType == 
FileType.PNG) {
       ByteArrayInputStream inputStream = new ByteArrayInputStream(byteArray);
       BufferedImage bufferedImage = ImageIO.read(inputStream);
       switch (getImageTransparencyType(bufferedImage)) {
          case REQUIRES_LOSSLESS -> {
             return LosslessFactory.createFromImage(document, bufferedImage);
          }
          case FLATTENABLE_ALPHA -> {
             return createAlphaAwarePDImageXObject(document, bufferedImage);
          }
          case NO_ALPHA -> {
             try {
                return JPEGFactory.createFromByteArray(document, byteArray);
             } catch (Exception e) {
                return LosslessFactory.createFromImage(document, bufferedImage);
             }
          }
       }
    }
    throw new IllegalArgumentException("Image type " + fileType + " not 
supported: " + name);
} {code}
The final condition differs from the original codebase.

The empirical check logic is implemented in the 
"getImageTransparencyType(bufferedImage)" function.

"createAlphaAwarePDImageXObject" function includes the flattening logic, which 
simulates the effect of alpha to generate an image using JPEGFactory

Since we want to retain the original createFromByteArray logic as much as 
possible, we had to bring the entire PNGConverter into our project codebase.

Would it be possible to support this kind of customization in a simpler way? It 
would be much better for us if we didn’t have to include the entire 
PNGConverter.

It would be a great help for us if the PNGConverter class could be made "public 
final" and its methods exposed.


was (Author: JIRAUSER310139):
I see - excessive exposure could be an issue.
In our current customization, we perform an empirical check after PNGConverter 
fails.
This logic determines whether converting the image using JPEGFactory would 
significantly alter its original appearance. 
If this logic determines that the result from JPEGFactory would not differ 
significantly from the original image, the image is converted using 
JPEGFactory. Otherwise, if the difference is expected to be substantial, 
LosslessFactory is used instead. We aimed to improve performance by slightly 
increasing the use of JPEGFactory compared to the original implementation.

This is our current version of "createFromByteArray" function
{code:java}
public PDImageXObject createFromByteArray(PDDocument document, byte[] 
byteArray, String name) throws IOException {
    FileType fileType = FileTypeDetector.detectFileType(byteArray);
    if (fileType == null) {
       throw new IllegalArgumentException("Image type not supported: " + name);
    }
    if (fileType == FileType.JPEG) {
       return JPEGFactory.createFromByteArray(document, byteArray);
    }
    if (fileType == FileType.PNG) {
       // Try to directly convert the image without recoding it.
       PDImageXObject image = PNGConverter.convertPNGImage(document, byteArray);
       if (image != null) {
          return image;
       }
    }
    if (fileType == FileType.TIFF) {
       try {
          return CCITTFactory.createFromByteArray(document, byteArray);
       } catch (IOException ex) {
          log.debug("Reading as TIFF failed, setting fileType to PNG", ex);
          // Plan B: try reading with ImageIO
          // common exception:
          // First image in tiff is not CCITT T4 or T6 compressed
          fileType = FileType.PNG;
       }
    }
    if (fileType == FileType.BMP || fileType == FileType.GIF || fileType == 
FileType.PNG) {
       ByteArrayInputStream inputStream = new ByteArrayInputStream(byteArray);
       BufferedImage bufferedImage = ImageIO.read(inputStream);
       switch (getImageTransparencyType(bufferedImage)) {
          case REQUIRES_LOSSLESS -> {
             return LosslessFactory.createFromImage(document, bufferedImage);
          }
          case FLATTENABLE_ALPHA -> {
             return createAlphaAwarePDImageXObject(document, bufferedImage);
          }
          case NO_ALPHA -> {
             try {
                return JPEGFactory.createFromByteArray(document, byteArray);
             } catch (Exception e) {
                return LosslessFactory.createFromImage(document, bufferedImage);
             }
          }
       }
    }
    throw new IllegalArgumentException("Image type " + fileType + " not 
supported: " + name);
} {code}
The final condition differs from the original codebase.

The empirical check logic is implemented in the 
"getImageTransparencyType(bufferedImage)" function.

"createAlphaAwarePDImageXObject" function includes the flattening logic, which 
simulates the effect of alpha to generate an image using JPEGFactory

Since we want to retain the original createFromByteArray logic as much as 
possible, we had to bring the entire PNGConverter into our project codebase.

Would it be possible to support this kind of customization in a simpler way? It 
would be much better for us if we didn’t have to include the entire 
PNGConverter.

It would be a great help for us if the PNGConverter class could be made "public 
final" and its methods exposed.

> Issues encountered while customizing "PDImageXObject"
> ------------------------------------------------------
>
>                 Key: PDFBOX-6032
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-6032
>             Project: PDFBox
>          Issue Type: Wish
>          Components: PDModel
>    Affects Versions: 3.0.5 PDFBox
>            Reporter: Ilgoo Kim
>            Priority: Major
>
> In my team, we are using PDFBox to add an PDF-export feature to our editor 
> service.
> Unfortunately the performance does not meet our expectations, especially 
> when LosslessFactory is invoked within the "createFromByteArray" function of 
> PDImageXObject.
> Therefore we customized the "createFromByteArray" function to favor 
> JPEGFactory over LosslessFactory in order to improve performance, even at the 
> cost of some image quality loss. (based on the criteria through a 
> pixel-by-pixel alpha check)
> However, bringing the "createFromByteArray" function into our project 
> introduces a problem: since "PNGConverter" is not public, we are forced to 
> copy the entire "PNGConverter" class into our codebase as well.
> I was wondering if it would be possible to make the "PNGConverter" class 
> public, or alternatively, if there is a recommended way to better customize 
> the "createFromByteArray" function.
> Thank you.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Comment Edited] (PDFBOX-6032) Issues encountered while customizing "PDImageXObject"

Reply via email to

[jira] [Comment Edited] (PDFBOX-6032) Issues encountered while customizing "PDImageXObject"