Dear podofo users, I'm currently investigating some forged pdf documents which try to circumvent the signature checks of various pdf viewers. Maybe you heard about that: https://www.pdf-insecurity.org/signature/signature.html
When trying to load the used pdfs there, nearly all of them produced some loading errors (which is fine, I guess). But four of them could be loaded, so I could check them against my own signature validation code. Sadly, my code did not find any signature fields for two of the four documents. When debugging this and stepping through the podofo code, I learned that there is some issue when reading the xref table. As it turned out, the forged pdf did not respect the fixed xref entry size of 20 characters, but used only a LF were CR+LF is needed. See: https://www.pdf-insecurity.org/download/exploits/20_Soda_PDF_Desktop/siwa.pdf Podofo is doing right in expecting the entry to be 20 characters long. But when the entry is shorter, Podofo does not recognize this and continues with the parsing, likely creating bogus (if it not entirely fails). I think that's not a good behaviour. It would be better if the parsing fails as soon as an entry shorter (or larger) than 20 chars is encountered. And if I'm not mistaken, such a check is not even difficult to implement: PdfParser.cpp:894ff > // XRefEntry is defined in PDF spec section 7.5.4 Cross-Reference Table as > // nnnnnnnnnn ggggg n eol > // nnnnnnnnnn is 10-digit offset number with max value 9999999999 (bigger > than 2**32 = 4GB) > // ggggg is a 5-digit generation number with max value 99999 (smaller than > 2**17) > int read = sscanf( m_buffer.GetBuffer(), > "%10" PDF_FORMAT_INT64 " %5" PDF_FORMAT_INT64 > " %c%c%c", > &llOffset, &llGeneration, &cUsed, &empty1, > &empty2 ); > > if ( read != 5 ) > { > // part of XrefEntry is missing, or i/o error > PODOFO_RAISE_ERROR( ePdfError_InvalidXRef ); > } > empty1 and empty2 are two characters for holding CR('\r') and LF('\n'). So when an xref entry misses the CR (or the LF), empty2 will hold the first character of the next xref entry. So why not check the two empty variables if they hold the required characters and failing when they don't: if ( read != 5 || *( empty1 != '\r' ) || ( empty2 != '\n' )* ) > Or a little bit less strict (allowing LF+CR): if ( read != 5 || *( empty1 != '\r' && empty2 != 'r' ) || ( empty1 != '\n' > && empty2 != '\n' )* ) > With this proposed check, wrong xref entry sizes would be handled more consistantly / reported straight away instead of silently continuing with a wrong offset. We could also add an additional error code for it instead of using ePdfError_InvalidXRef. What do you think about that? Greetings F.E.
podofo_xref_entry_crlf_check.patch
Description: Binary data
_______________________________________________ Podofo-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/podofo-users
