[ 
https://issues.apache.org/jira/browse/PDFBOX-6030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18000566#comment-18000566
 ] 

Tilman Hausherr commented on PDFBOX-6030:
-----------------------------------------

Thank you, to me this looks like the percentage lost in time is higher than the 
percentage won in size. And I'm leaning towards providing yet another method, 
with the additional parameter.

> JPEGFactory: createImage and setOptimizeHuffmanTables
> -----------------------------------------------------
>
>                 Key: PDFBOX-6030
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-6030
>             Project: PDFBox
>          Issue Type: Wish
>            Reporter: Zer Jun Eng
>            Priority: Minor
>         Attachments: zoo-711050_1920.jpg
>
>
> Dear PDFBox developers,
> I'm writing to request an enhancement to the JPEGFactory class, specifically 
> concerning the createFromImage(PDDocument document, BufferedImage image, 
> float quality, int dpi) method.
> Currently, when using this method, there isn't a direct way to enable the 
> setOptimizeHuffmanTables option of JPEGImageWriteParam. This optimization can 
> be quite beneficial for reducing file size.
> To work around this, my team currently has to copy the JPEGFactory source 
> code into our project and modify the private encodeImageToJPEGStream method. 
> This approach isn't ideal as it makes maintenance more difficult and prevents 
> us from easily updating to new PDFBox versions.
> Would you consider exposing this setOptimizeHuffmanTables option, perhaps as 
> an additional parameter to the createFromImage method or through a separate 
> setter on JPEGFactory? This would allow users to leverage this optimization 
> without resorting to workarounds.
> Thank you for considering this request.
> —
> Replying to the email thread: 
> https://lists.apache.org/thread/pgo0m1r8vgxd12zl3499fv38s163mpm3
> I wrote a minimal benchmark code that compares the difference between the 
> output file size and execution time with and without setOptimizeHuffmanTables:
> {code:java}
> import java.awt.image.BufferedImage;
> import java.io.ByteArrayOutputStream;
> import java.io.File;
> import java.io.IOException;
> import java.time.Duration;
> import java.time.Instant;
> import java.util.Iterator;
> import javax.imageio.IIOImage;
> import javax.imageio.ImageIO;
> import javax.imageio.ImageTypeSpecifier;
> import javax.imageio.ImageWriteParam;
> import javax.imageio.ImageWriter;
> import javax.imageio.metadata.IIOMetadata;
> import javax.imageio.plugins.jpeg.JPEGImageWriteParam;
> import javax.imageio.stream.ImageOutputStream;
> import org.w3c.dom.Element;
> class Huffman {
>   private static ImageWriter getJPEGImageWriter() throws IOException {
>     Iterator<ImageWriter> writers = ImageIO.getImageWritersBySuffix("jpeg");
>     while (writers.hasNext()) {
>       ImageWriter writer = writers.next();
>       if (writer == null) {
>         continue;
>       }
>       // PDFBOX-3566: avoid CLibJPEGImageWriter, which is not a 
> JPEGImageWriteParam
>       if (writer.getDefaultWriteParam() instanceof JPEGImageWriteParam) {
>         return writer;
>       }
>       writer.dispose();
>     }
>     throw new IOException("No ImageWriter found for JPEG format");
>   }
>   public static byte[] encodeImageToJPEGStream(BufferedImage image, float 
> quality, int dpi,
>       boolean optimizeHuffman)
>       throws IOException {
>     ImageWriter imageWriter = getJPEGImageWriter(); // find JAI writer
>     ByteArrayOutputStream baos = new ByteArrayOutputStream();
>     try (ImageOutputStream ios = ImageIO.createImageOutputStream(baos)) {
>       imageWriter.setOutput(ios);
>       // add compression
>       JPEGImageWriteParam jpegParam = (JPEGImageWriteParam) 
> imageWriter.getDefaultWriteParam();
>       jpegParam.setCompressionMode(ImageWriteParam.MODE_EXPLICIT);
>       jpegParam.setCompressionQuality(quality);
>       jpegParam.setOptimizeHuffmanTables(optimizeHuffman);
>       // add metadata
>       ImageTypeSpecifier imageTypeSpecifier = new ImageTypeSpecifier(image);
>       IIOMetadata data = 
> imageWriter.getDefaultImageMetadata(imageTypeSpecifier, jpegParam);
>       Element tree = (Element) data.getAsTree("javax_imageio_jpeg_image_1.0");
>       Element jfif = (Element) tree.getElementsByTagName("app0JFIF").item(0);
>       String dpiString = Integer.toString(dpi);
>       jfif.setAttribute("Xdensity", dpiString);
>       jfif.setAttribute("Ydensity", dpiString);
>       jfif.setAttribute("resUnits", "1"); // 1 = dots/inch
>       // write
>       imageWriter.write(data, new IIOImage(image, null, null), jpegParam);
>       return baos.toByteArray();
>     } finally {
>       imageWriter.dispose();
>     }
>   }
>   public static long benchmark(BufferedImage img, boolean optimizeHuffman) 
> throws IOException {
>     final float quality = 0.75f;
>     final int dpi = 72;
>     Instant i1 = Instant.now();
>     int length = encodeImageToJPEGStream(img, quality, dpi, 
> optimizeHuffman).length;
>     Instant i2 = Instant.now();
>     long executionTime = Duration.between(i1, i2).toMillis();
>     System.out.printf("optimize Huffman = %b: %d bytes, execution time %d 
> ms%n",
>         optimizeHuffman, length, executionTime);
>     return executionTime;
>   }
>   public static void main(String[] args) throws IOException {
>     final int runs = 100;
>     long totalOptimizedExecutionTime = 0L;
>     long totalUnoptimizedExecutionTime = 0L;
>     BufferedImage img = ImageIO.read(new File("zoo-711050_1920.jpg"));
>     for (int i = 0; i < runs; i++) {
>       totalOptimizedExecutionTime += benchmark(img, true);
>       totalUnoptimizedExecutionTime += benchmark(img, false);
>     }
>     
>     float avgOptimizedExecutionTime = (float) totalOptimizedExecutionTime / 
> runs;
>     float avgUnoptimizedExecutionTime = (float) totalUnoptimizedExecutionTime 
> / runs;
>     System.out.printf("Average optimized execution time: %f ms%n", 
> avgOptimizedExecutionTime);
>     System.out.printf("Average unoptimized execution time: %f ms%n", 
> avgUnoptimizedExecutionTime);
>   }
> }
> {code}
> {code:sh}
> ...
> optimize Huffman = true: 580768 bytes, execution time 192 ms
> optimize Huffman = false: 589050 bytes, execution time 167 ms
> Average optimized execution time: 192.729996 ms
> Average unoptimized execution time: 167.929993 ms
> {code}
> I used an image I randomly picked from https://pixabay.com/ (attached below). 
> The results show that enabling setOptimizeHuffmanTables produces a slightly 
> smaller file size but takes longer to execute.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to