On Fri, 2020-11-06 at 13:14 +0100, F. E. wrote:
> one of our customers uses a self-built pdf file, which is processed
> with podofo and causes a stack overflow crash inside of podofo when
> trying to load the file.
Hi,
I gave it a quick test and it doesn't crash here (trunk at r2016). It
can be my stack size is larger than that yours. I get an exception
being thrown with this content:
PoDoFo encountered an error. Error: 21 ePdfError_InvalidXRef
Callstack:
#0 Error Source: src/podofo/doc/PdfMemDocument.cpp:263
Information: Handler fixes issue #49
#1 Error Source: src/podofo/base/PdfParser.cpp:272
Information: Unable to load objects from file.
#2 Error Source: src/podofo/base/PdfParser.cpp:375
Information: Unable to load xref entries.
#3 Error Source: src/podofo/base/PdfParser.cpp:974
#4 Error Source: src/podofo/base/PdfParser.cpp:974
#5 Error Source: src/podofo/base/PdfParser.cpp:974
#6 Error Source: src/podofo/base/PdfParser.cpp:974
#7 Error Source: src/podofo/base/PdfParser.cpp:974
#8 Error Source: src/podofo/base/PdfParser.cpp:974
...
#251 Error Source: src/podofo/base/PdfParser.cpp:974
#252 Error Source: src/podofo/base/PdfParser.cpp:974
#253 Error Source: src/podofo/base/PdfParser.cpp:104
Using gdb I get this backtrace (cut for brevity):
Breakpoint 1, PoDoFo::PdfParser::ReadXRefStreamContents (this=0x674820,
lOffset=566598, bReadOnlyTrailer=false) at src/podofo/base/PdfParser.cpp:974
974 e.AddToCallstack( __FILE__, __LINE__ );
(gdb) l
969 } catch(PdfError &e) {
970 /* Be forgiving, the error happens when an entry in XRef
stream points
971 to a wrong place (offset) in the PDF file. */
972 if( e != ePdfError_NoNumber )
973 {
974 e.AddToCallstack( __FILE__, __LINE__ );
975 throw e;
976 }
977 }
978 }
(gdb) p e
$1 = (PoDoFo::PdfError &) @0x6779f0: {_vptr.PdfError = 0x5d83f8 <vtable for
PoDoFo::PdfError+16>, m_error = PoDoFo::ePdfError_InvalidXRef, m_callStack =
std::deque with 1 element = {{m_nLine = 104,
m_sFile = "src/podofo/base/PdfParser.cpp", m_sInfo = "", m_swInfo =
L""}}, static s_DgbEnabled = true, static s_LogEnabled = true, static
m_fLogMessageCallback = 0x0}
(gdb) b 972
Breakpoint 2 at 0x58c2f6: file src/podofo/base/PdfParser.cpp, line 972.
(gdb) bt
#0 PoDoFo::PdfParser::ReadXRefStreamContents (this=0x674820, lOffset=566598,
bReadOnlyTrailer=false) at src/podofo/base/PdfParser.cpp:974
#1 0x000000000058b61b in PoDoFo::PdfParser::ReadXRefContents (this=0x674820,
lOffset=566598, bPositionAtEnd=false) at src/podofo/base/PdfParser.cpp:727
#2 0x000000000058c290 in PoDoFo::PdfParser::ReadXRefStreamContents
(this=0x674820, lOffset=574009, bReadOnlyTrailer=false) at
src/podofo/base/PdfParser.cpp:968
#3 0x000000000058b61b in PoDoFo::PdfParser::ReadXRefContents (this=0x674820,
lOffset=574009, bPositionAtEnd=false) at src/podofo/base/PdfParser.cpp:727
#4 0x000000000058c290 in PoDoFo::PdfParser::ReadXRefStreamContents
(this=0x674820, lOffset=581388, bReadOnlyTrailer=false) at
src/podofo/base/PdfParser.cpp:968
#5 0x000000000058b61b in PoDoFo::PdfParser::ReadXRefContents (this=0x674820,
lOffset=581388, bPositionAtEnd=false) at src/podofo/base/PdfParser.cpp:727
#6 0x000000000058c290 in PoDoFo::PdfParser::ReadXRefStreamContents
(this=0x674820, lOffset=588148, bReadOnlyTrailer=false) at
src/podofo/base/PdfParser.cpp:968
#7 0x000000000058b61b in PoDoFo::PdfParser::ReadXRefContents (this=0x674820,
lOffset=588148, bPositionAtEnd=false) at src/podofo/base/PdfParser.cpp:727
#8 0x000000000058c290 in PoDoFo::PdfParser::ReadXRefStreamContents
(this=0x674820, lOffset=594866, bReadOnlyTrailer=false) at
src/podofo/base/PdfParser.cpp:968
#9 0x000000000058b61b in PoDoFo::PdfParser::ReadXRefContents (this=0x674820,
lOffset=594866, bPositionAtEnd=false) at src/podofo/base/PdfParser.cpp:727
#10 0x000000000058c290 in PoDoFo::PdfParser::ReadXRefStreamContents
(this=0x674820, lOffset=602251, bReadOnlyTrailer=false) at
src/podofo/base/PdfParser.cpp:968
> I do not know the parsing code well enough to understand what goes
> wrong with the pdf file
Neither do I. The gdb backtrace suggests it is progressing with the
lOffset.
It looks like the file contains 325-times '%%EOF' and 325-times
'startxref' directives, which is quite inefficient way to create PDF
files, from my point of view. I do not say it's not possible to create
it this way, it's only inefficient.
On the other hand, it shows that the PoDoFo's catcher for the recursion
in the XRef table misbehaves (because there is no real recursion here)
and that the read of the XRef table this way can cause stack overflow
in some cases.
Bye,
zyx
_______________________________________________
Podofo-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/podofo-users