Hi,

It's tricky... The easiest would probably be to uncompress the whole file and remove all references. One has to remove it from the resources, and also from the content streams, and find the correct one. It takes more time to explain it than to do it. Here it is:

https://drive.google.com/file/d/1AmehgqCs3dkz_zR6TCrxNPOxpRTz09sb/view?usp=sharing

Please tell when you got it, so I can delete it.

Tilman

On 20.01.2025 18:10, Aaron Mulder wrote:
Thank you!  I guess I'm too inexperienced with PDFDebugger -- I didn't find
that page background.

So my next question is, is there a way to use PDFBox to open this PDF,
delete those three images (and any references to them) and then save it
back out?

I'm able to run a similar grep to your example and see the lines matching
the large images, but I don't understand enough to know what the object IDs
of the images themselves are.  I guess I'm kind of hoping that if I had
their IDs I could go into PDFBox and open the file and try to delete the
items with those IDs?  But I'm speculating here, lol.  Don't know if I'd
have to manually track down all usage of them and clear it in order to
write a valid PDF back out.

Thanks,
        Aaron

On Mon, Jan 20, 2025 at 11:51 AM Tilman Hausherr <thaush...@t-online.de>
wrote:

I looked at it with PDFDebugger... There's a background image that is
4MG compressed which is used by both pages, likely a background.

Then I looked at it with NOTEPAD++ and searched for /Length. This was
possible because it didn't have compressed object streams.

There is a second large image with 5 MB. Then I ran a regular expression
and got this:

Zeile  10698: <</BitsPerComponent 8/ColorSpace 9 0
R/Filter/FlateDecode/Height 3234/Intent/RelativeColorimetric/*Length
4784102*/Metadata 87 0 R/Name/X/Subtype/Image/Type/XObject/Width
2522>>stream
      Zeile  49536: <</BitsPerComponent
8/ColorSpace/DeviceGray/DecodeParms<</BitsPerComponent 1/Colors
1/Columns 2447>>/Filter/FlateDecode/Height
3161/Intent/RelativeColorimetric/*Length
5345666*/Name/X/Subtype/Image/Type/XObject/Width 2447>>stream
      Zeile 103697: <</BitsPerComponent
8/ColorSpace/DeviceGray/DecodeParms<</BitsPerComponent 1/Colors
1/Columns 2448>>/Filter/FlateDecode/Height
3161/Intent/RelativeColorimetric/*Length
5509296*/Name/X/Subtype/Image/Type/XObject/Width 2448>>stream

So you have 3 large images in total. I didn't bother researching where
they are, your PDF is very nested. I suspect that these are more
backgrounds, which contain these "dirty" lines.

Tilman

On 20.01.2025 16:56, Aaron Mulder wrote:
OK this is a long shot but... have a look at this PDF:


https://media.dndbeyond.com/compendium-images/phb/downloads/DnD_2024_Character-Sheet.pdf
It's 16 MB.  Eyeballing the thing, it doesn't seem like there's that much
complexity in there, though it does have a lot of background images or
textures.

Is there any way to inspect it for "what part of this is so huge?" and
possibly cut some things out to craft a version more like 1 MB?  You
know,
if there are a few 4 MB images embedded I could just edit it to cut them
out, or whatever -- some loss of fanciness is OK to me.

I'm going to be creating a bunch of digital D&D character records with
that
sheet and it seems like an epic waste of storage and bandwidth 😂

I looked at it in the PDF Debugger and couldn't find a way to identify
all
the elements on the page, much less by "largest first", though I may have
missed something.

Thanks,
        Aaron



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Reply via email to