Zer Jun Eng created PDFBOX-6030: ----------------------------------- Summary: JPEGFactory: createImage and setOptimizeHuffmanTables Key: PDFBOX-6030 URL: https://issues.apache.org/jira/browse/PDFBOX-6030 Project: PDFBox Issue Type: Wish Reporter: Zer Jun Eng Attachments: zoo-711050_1920.jpg
Dear PDFBox developers, I'm writing to request an enhancement to the JPEGFactory class, specifically concerning the createFromImage(PDDocument document, BufferedImage image, float quality, int dpi) method. Currently, when using this method, there isn't a direct way to enable the setOptimizeHuffmanTables option of JPEGImageWriteParam. This optimization can be quite beneficial for reducing file size. To work around this, my team currently has to copy the JPEGFactory source code into our project and modify the private encodeImageToJPEGStream method. This approach isn't ideal as it makes maintenance more difficult and prevents us from easily updating to new PDFBox versions. Would you consider exposing this setOptimizeHuffmanTables option, perhaps as an additional parameter to the createFromImage method or through a separate setter on JPEGFactory? This would allow users to leverage this optimization without resorting to workarounds. Thank you for considering this request. — Replying to the email thread: https://lists.apache.org/thread/pgo0m1r8vgxd12zl3499fv38s163mpm3 I wrote a minimal benchmark code that compares the difference between the output file size and execution time with and without setOptimizeHuffmanTables: {code:java} import java.awt.image.BufferedImage; import java.io.ByteArrayOutputStream; import java.io.File; import java.io.IOException; import java.time.Duration; import java.time.Instant; import java.util.Iterator; import javax.imageio.IIOImage; import javax.imageio.ImageIO; import javax.imageio.ImageTypeSpecifier; import javax.imageio.ImageWriteParam; import javax.imageio.ImageWriter; import javax.imageio.metadata.IIOMetadata; import javax.imageio.plugins.jpeg.JPEGImageWriteParam; import javax.imageio.stream.ImageOutputStream; import org.w3c.dom.Element; class Huffman { private static ImageWriter getJPEGImageWriter() throws IOException { Iterator<ImageWriter> writers = ImageIO.getImageWritersBySuffix("jpeg"); while (writers.hasNext()) { ImageWriter writer = writers.next(); if (writer == null) { continue; } // PDFBOX-3566: avoid CLibJPEGImageWriter, which is not a JPEGImageWriteParam if (writer.getDefaultWriteParam() instanceof JPEGImageWriteParam) { return writer; } writer.dispose(); } throw new IOException("No ImageWriter found for JPEG format"); } public static byte[] encodeImageToJPEGStream(BufferedImage image, float quality, int dpi, boolean optimizeHuffman) throws IOException { ImageWriter imageWriter = getJPEGImageWriter(); // find JAI writer ByteArrayOutputStream baos = new ByteArrayOutputStream(); try (ImageOutputStream ios = ImageIO.createImageOutputStream(baos)) { imageWriter.setOutput(ios); // add compression JPEGImageWriteParam jpegParam = (JPEGImageWriteParam) imageWriter.getDefaultWriteParam(); jpegParam.setCompressionMode(ImageWriteParam.MODE_EXPLICIT); jpegParam.setCompressionQuality(quality); jpegParam.setOptimizeHuffmanTables(optimizeHuffman); // add metadata ImageTypeSpecifier imageTypeSpecifier = new ImageTypeSpecifier(image); IIOMetadata data = imageWriter.getDefaultImageMetadata(imageTypeSpecifier, jpegParam); Element tree = (Element) data.getAsTree("javax_imageio_jpeg_image_1.0"); Element jfif = (Element) tree.getElementsByTagName("app0JFIF").item(0); String dpiString = Integer.toString(dpi); jfif.setAttribute("Xdensity", dpiString); jfif.setAttribute("Ydensity", dpiString); jfif.setAttribute("resUnits", "1"); // 1 = dots/inch // write imageWriter.write(data, new IIOImage(image, null, null), jpegParam); return baos.toByteArray(); } finally { imageWriter.dispose(); } } public static long benchmark(BufferedImage img, boolean optimizeHuffman) throws IOException { final float quality = 0.75f; final int dpi = 72; Instant i1 = Instant.now(); int length = encodeImageToJPEGStream(img, quality, dpi, optimizeHuffman).length; Instant i2 = Instant.now(); long executionTime = Duration.between(i1, i2).toMillis(); System.out.printf("optimize Huffman = %b: %d bytes, execution time %d ms%n", optimizeHuffman, length, executionTime); return executionTime; } public static void main(String[] args) throws IOException { final int runs = 100; long totalOptimizedExecutionTime = 0L; long totalUnoptimizedExecutionTime = 0L; BufferedImage img = ImageIO.read(new File("zoo-711050_1920.jpg")); for (int i = 0; i < runs; i++) { totalOptimizedExecutionTime += benchmark(img, true); totalUnoptimizedExecutionTime += benchmark(img, false); } float avgOptimizedExecutionTime = (float) totalOptimizedExecutionTime / runs; float avgUnoptimizedExecutionTime = (float) totalUnoptimizedExecutionTime / runs; System.out.printf("Average optimized execution time: %f ms%n", avgOptimizedExecutionTime); System.out.printf("Average unoptimized execution time: %f ms%n", avgUnoptimizedExecutionTime); } } {code} {code:sh} ... optimize Huffman = true: 580768 bytes, execution time 192 ms optimize Huffman = false: 589050 bytes, execution time 167 ms Average optimized execution time: 192.729996 ms Average unoptimized execution time: 167.929993 ms {code} I used an image I randomly picked from https://pixabay.com/ (attached below). The results show that enabling setOptimizeHuffmanTables produces a slightly smaller file size but takes longer to execute. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org