Dear podofo users,

I'm currently investigating some forged pdf documents which try to
circumvent the signature checks of various pdf viewers. Maybe you heard
about that:
https://www.pdf-insecurity.org/signature/signature.html

When trying to load the used pdfs there, nearly all of them produced some
loading errors (which is fine, I guess). But four of them could be loaded,
so I could check them against my own signature validation code. Sadly, my
code did not find any signature fields for two of the four documents.
When debugging this and stepping through the podofo code, I learned that
there is some issue when reading the xref table. As it turned out, the
forged pdf did not respect the fixed xref entry size of 20 characters, but
used only a LF were CR+LF is needed. See:
https://www.pdf-insecurity.org/download/exploits/20_Soda_PDF_Desktop/siwa.pdf

Podofo is doing right in expecting the entry to be 20 characters long. But
when the entry is shorter, Podofo does not recognize this and continues
with the parsing, likely creating bogus (if it not entirely fails). I think
that's not a good behaviour. It would be better if the parsing fails as
soon as an entry shorter (or larger) than 20 chars is encountered. And if
I'm not mistaken, such a check is not even difficult to implement:

PdfParser.cpp:894ff

> // XRefEntry is defined in PDF spec section 7.5.4 Cross-Reference Table as
> // nnnnnnnnnn ggggg n eol
> // nnnnnnnnnn is 10-digit offset number with max value 9999999999 (bigger
> than 2**32 = 4GB)
> // ggggg is a 5-digit generation number with max value 99999 (smaller than
> 2**17)
> int read = sscanf( m_buffer.GetBuffer(),
>                             "%10" PDF_FORMAT_INT64 " %5" PDF_FORMAT_INT64
> " %c%c%c",
>                             &llOffset, &llGeneration, &cUsed, &empty1,
> &empty2 );
>
> if ( read != 5 )
> {
>     // part of XrefEntry is missing, or i/o error
>     PODOFO_RAISE_ERROR( ePdfError_InvalidXRef );
> }
>

empty1 and empty2 are two characters for holding CR('\r') and LF('\n'). So
when an xref entry
misses the CR (or the LF), empty2 will hold the first character of the next
xref entry. So why not check the two empty variables if they hold the
required characters and failing when they don't:

if ( read != 5 || *( empty1 != '\r' ) ||  ( empty2 != '\n' )* )
>

Or a little bit less strict (allowing LF+CR):

if ( read != 5 || *( empty1 != '\r' && empty2 != 'r' ) ||  ( empty1 != '\n'
> && empty2 != '\n' )* )
>

With this proposed check, wrong xref entry sizes would be handled more
consistantly / reported straight away instead of silently continuing with a
wrong offset. We could also add an additional error code for it instead of
using ePdfError_InvalidXRef.

What do you think about that?

Greetings
F.E.

Attachment: podofo_xref_entry_crlf_check.patch
Description: Binary data

_______________________________________________
Podofo-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/podofo-users

Reply via email to