[ https://issues.apache.org/jira/browse/PDFBOX-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18003799#comment-18003799 ]
Ilgoo Kim edited comment on PDFBOX-6032 at 7/8/25 2:01 PM: ----------------------------------------------------------- I see - excessive exposure could be an issue. Because our implementation handles different cases based on specific conditions rather than relying on a fallback, we found it necessary to customize the function accordingly. In our current customization, we perform an empirical check after PNGConverter fails. This logic determines whether converting the image using JPEGFactory would significantly alter its original appearance. If this logic determines that the result from JPEGFactory would not differ significantly from the original image, the image is converted using JPEGFactory. Otherwise, if the difference is expected to be substantial, LosslessFactory is used instead. We aimed to improve performance by slightly increasing the use of JPEGFactory compared to the original implementation. This is our current version of "createFromByteArray" function {code:java} public PDImageXObject createFromByteArray(PDDocument document, byte[] byteArray, String name) throws IOException { FileType fileType = FileTypeDetector.detectFileType(byteArray); if (fileType == null) { throw new IllegalArgumentException("Image type not supported: " + name); } if (fileType == FileType.JPEG) { return JPEGFactory.createFromByteArray(document, byteArray); } if (fileType == FileType.PNG) { // Try to directly convert the image without recoding it. PDImageXObject image = PNGConverter.convertPNGImage(document, byteArray); if (image != null) { return image; } } if (fileType == FileType.TIFF) { try { return CCITTFactory.createFromByteArray(document, byteArray); } catch (IOException ex) { log.debug("Reading as TIFF failed, setting fileType to PNG", ex); // Plan B: try reading with ImageIO // common exception: // First image in tiff is not CCITT T4 or T6 compressed fileType = FileType.PNG; } } if (fileType == FileType.BMP || fileType == FileType.GIF || fileType == FileType.PNG) { ByteArrayInputStream inputStream = new ByteArrayInputStream(byteArray); BufferedImage bufferedImage = ImageIO.read(inputStream); switch (getImageTransparencyType(bufferedImage)) { case REQUIRES_LOSSLESS -> { return LosslessFactory.createFromImage(document, bufferedImage); } case FLATTENABLE_ALPHA -> { return createAlphaAwarePDImageXObject(document, bufferedImage); } case NO_ALPHA -> { try { return JPEGFactory.createFromByteArray(document, byteArray); } catch (Exception e) { return LosslessFactory.createFromImage(document, bufferedImage); } } } } throw new IllegalArgumentException("Image type " + fileType + " not supported: " + name); } {code} The final condition differs from the original codebase. The empirical check logic is implemented in the "getImageTransparencyType(bufferedImage)" function. "createAlphaAwarePDImageXObject" function includes the flattening logic, which simulates the effect of alpha to generate an image using JPEGFactory Since we want to retain the original createFromByteArray logic as much as possible, we had to bring the entire PNGConverter into our project codebase. Would it be possible to support this kind of customization in a simpler way? It would be much better for us if we didn’t have to include the entire PNGConverter. It would be a great help for us if the PNGConverter class could be made "public final" and its methods exposed. was (Author: JIRAUSER310139): I see - excessive exposure could be an issue. In our current customization, we perform an empirical check after PNGConverter fails. This logic determines whether converting the image using JPEGFactory would significantly alter its original appearance. If this logic determines that the result from JPEGFactory would not differ significantly from the original image, the image is converted using JPEGFactory. Otherwise, if the difference is expected to be substantial, LosslessFactory is used instead. We aimed to improve performance by slightly increasing the use of JPEGFactory compared to the original implementation. This is our current version of "createFromByteArray" function {code:java} public PDImageXObject createFromByteArray(PDDocument document, byte[] byteArray, String name) throws IOException { FileType fileType = FileTypeDetector.detectFileType(byteArray); if (fileType == null) { throw new IllegalArgumentException("Image type not supported: " + name); } if (fileType == FileType.JPEG) { return JPEGFactory.createFromByteArray(document, byteArray); } if (fileType == FileType.PNG) { // Try to directly convert the image without recoding it. PDImageXObject image = PNGConverter.convertPNGImage(document, byteArray); if (image != null) { return image; } } if (fileType == FileType.TIFF) { try { return CCITTFactory.createFromByteArray(document, byteArray); } catch (IOException ex) { log.debug("Reading as TIFF failed, setting fileType to PNG", ex); // Plan B: try reading with ImageIO // common exception: // First image in tiff is not CCITT T4 or T6 compressed fileType = FileType.PNG; } } if (fileType == FileType.BMP || fileType == FileType.GIF || fileType == FileType.PNG) { ByteArrayInputStream inputStream = new ByteArrayInputStream(byteArray); BufferedImage bufferedImage = ImageIO.read(inputStream); switch (getImageTransparencyType(bufferedImage)) { case REQUIRES_LOSSLESS -> { return LosslessFactory.createFromImage(document, bufferedImage); } case FLATTENABLE_ALPHA -> { return createAlphaAwarePDImageXObject(document, bufferedImage); } case NO_ALPHA -> { try { return JPEGFactory.createFromByteArray(document, byteArray); } catch (Exception e) { return LosslessFactory.createFromImage(document, bufferedImage); } } } } throw new IllegalArgumentException("Image type " + fileType + " not supported: " + name); } {code} The final condition differs from the original codebase. The empirical check logic is implemented in the "getImageTransparencyType(bufferedImage)" function. "createAlphaAwarePDImageXObject" function includes the flattening logic, which simulates the effect of alpha to generate an image using JPEGFactory Since we want to retain the original createFromByteArray logic as much as possible, we had to bring the entire PNGConverter into our project codebase. Would it be possible to support this kind of customization in a simpler way? It would be much better for us if we didn’t have to include the entire PNGConverter. It would be a great help for us if the PNGConverter class could be made "public final" and its methods exposed. > Issues encountered while customizing "PDImageXObject" > ------------------------------------------------------ > > Key: PDFBOX-6032 > URL: https://issues.apache.org/jira/browse/PDFBOX-6032 > Project: PDFBox > Issue Type: Wish > Components: PDModel > Affects Versions: 3.0.5 PDFBox > Reporter: Ilgoo Kim > Priority: Major > > In my team, we are using PDFBox to add an PDF-export feature to our editor > service. > Unfortunately the performance does not meet our expectations, especially > when LosslessFactory is invoked within the "createFromByteArray" function of > PDImageXObject. > Therefore we customized the "createFromByteArray" function to favor > JPEGFactory over LosslessFactory in order to improve performance, even at the > cost of some image quality loss. (based on the criteria through a > pixel-by-pixel alpha check) > However, bringing the "createFromByteArray" function into our project > introduces a problem: since "PNGConverter" is not public, we are forced to > copy the entire "PNGConverter" class into our codebase as well. > I was wondering if it would be possible to make the "PNGConverter" class > public, or alternatively, if there is a recommended way to better customize > the "createFromByteArray" function. > Thank you. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org