Zer Jun Eng created PDFBOX-6030:
-----------------------------------

             Summary: JPEGFactory: createImage and setOptimizeHuffmanTables
                 Key: PDFBOX-6030
                 URL: https://issues.apache.org/jira/browse/PDFBOX-6030
             Project: PDFBox
          Issue Type: Wish
            Reporter: Zer Jun Eng
         Attachments: zoo-711050_1920.jpg

Dear PDFBox developers,

I'm writing to request an enhancement to the JPEGFactory class, specifically 
concerning the createFromImage(PDDocument document, BufferedImage image, float 
quality, int dpi) method.

Currently, when using this method, there isn't a direct way to enable the 
setOptimizeHuffmanTables option of JPEGImageWriteParam. This optimization can 
be quite beneficial for reducing file size.

To work around this, my team currently has to copy the JPEGFactory source code 
into our project and modify the private encodeImageToJPEGStream method. This 
approach isn't ideal as it makes maintenance more difficult and prevents us 
from easily updating to new PDFBox versions.

Would you consider exposing this setOptimizeHuffmanTables option, perhaps as an 
additional parameter to the createFromImage method or through a separate setter 
on JPEGFactory? This would allow users to leverage this optimization without 
resorting to workarounds.

Thank you for considering this request.

—

Replying to the email thread: 
https://lists.apache.org/thread/pgo0m1r8vgxd12zl3499fv38s163mpm3

I wrote a minimal benchmark code that compares the difference between the 
output file size and execution time with and without setOptimizeHuffmanTables:

{code:java}
import java.awt.image.BufferedImage;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.IOException;
import java.time.Duration;
import java.time.Instant;
import java.util.Iterator;
import javax.imageio.IIOImage;
import javax.imageio.ImageIO;
import javax.imageio.ImageTypeSpecifier;
import javax.imageio.ImageWriteParam;
import javax.imageio.ImageWriter;
import javax.imageio.metadata.IIOMetadata;
import javax.imageio.plugins.jpeg.JPEGImageWriteParam;
import javax.imageio.stream.ImageOutputStream;
import org.w3c.dom.Element;

class Huffman {

  private static ImageWriter getJPEGImageWriter() throws IOException {
    Iterator<ImageWriter> writers = ImageIO.getImageWritersBySuffix("jpeg");
    while (writers.hasNext()) {
      ImageWriter writer = writers.next();
      if (writer == null) {
        continue;
      }
      // PDFBOX-3566: avoid CLibJPEGImageWriter, which is not a 
JPEGImageWriteParam
      if (writer.getDefaultWriteParam() instanceof JPEGImageWriteParam) {
        return writer;
      }
      writer.dispose();
    }
    throw new IOException("No ImageWriter found for JPEG format");
  }

  public static byte[] encodeImageToJPEGStream(BufferedImage image, float 
quality, int dpi,
      boolean optimizeHuffman)
      throws IOException {
    ImageWriter imageWriter = getJPEGImageWriter(); // find JAI writer
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    try (ImageOutputStream ios = ImageIO.createImageOutputStream(baos)) {
      imageWriter.setOutput(ios);

      // add compression
      JPEGImageWriteParam jpegParam = (JPEGImageWriteParam) 
imageWriter.getDefaultWriteParam();
      jpegParam.setCompressionMode(ImageWriteParam.MODE_EXPLICIT);
      jpegParam.setCompressionQuality(quality);

      jpegParam.setOptimizeHuffmanTables(optimizeHuffman);

      // add metadata
      ImageTypeSpecifier imageTypeSpecifier = new ImageTypeSpecifier(image);
      IIOMetadata data = 
imageWriter.getDefaultImageMetadata(imageTypeSpecifier, jpegParam);
      Element tree = (Element) data.getAsTree("javax_imageio_jpeg_image_1.0");
      Element jfif = (Element) tree.getElementsByTagName("app0JFIF").item(0);
      String dpiString = Integer.toString(dpi);
      jfif.setAttribute("Xdensity", dpiString);
      jfif.setAttribute("Ydensity", dpiString);
      jfif.setAttribute("resUnits", "1"); // 1 = dots/inch

      // write
      imageWriter.write(data, new IIOImage(image, null, null), jpegParam);

      return baos.toByteArray();
    } finally {
      imageWriter.dispose();
    }
  }

  public static long benchmark(BufferedImage img, boolean optimizeHuffman) 
throws IOException {
    final float quality = 0.75f;
    final int dpi = 72;

    Instant i1 = Instant.now();
    int length = encodeImageToJPEGStream(img, quality, dpi, 
optimizeHuffman).length;
    Instant i2 = Instant.now();
    long executionTime = Duration.between(i1, i2).toMillis();

    System.out.printf("optimize Huffman = %b: %d bytes, execution time %d ms%n",
        optimizeHuffman, length, executionTime);
    return executionTime;
  }

  public static void main(String[] args) throws IOException {
    final int runs = 100;
    long totalOptimizedExecutionTime = 0L;
    long totalUnoptimizedExecutionTime = 0L;

    BufferedImage img = ImageIO.read(new File("zoo-711050_1920.jpg"));

    for (int i = 0; i < runs; i++) {
      totalOptimizedExecutionTime += benchmark(img, true);
      totalUnoptimizedExecutionTime += benchmark(img, false);
    }
    
    float avgOptimizedExecutionTime = (float) totalOptimizedExecutionTime / 
runs;
    float avgUnoptimizedExecutionTime = (float) totalUnoptimizedExecutionTime / 
runs;

    System.out.printf("Average optimized execution time: %f ms%n", 
avgOptimizedExecutionTime);
    System.out.printf("Average unoptimized execution time: %f ms%n", 
avgUnoptimizedExecutionTime);
  }
}
{code}

{code:sh}
...
optimize Huffman = true: 580768 bytes, execution time 192 ms
optimize Huffman = false: 589050 bytes, execution time 167 ms
Average optimized execution time: 192.729996 ms
Average unoptimized execution time: 167.929993 ms
{code}

I used an image I randomly picked from https://pixabay.com/ (attached below). 
The results show that enabling setOptimizeHuffmanTables produces a slightly 
smaller file size but takes longer to execute.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to