https://bugs.kde.org/show_bug.cgi?id=380456

Stefan Brüns <stefan.bru...@rwth-aachen.de> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |stefan.bruens@rwth-aachen.d
                   |                            |e

--- Comment #22 from Stefan Brüns <stefan.bru...@rwth-aachen.de> ---
(In reply to tagwerk19 from comment #21)
> Created attachment 143869 [details]
> pdftotext results from
> https://ipfs.io/ipfs/QmVqWhPuQkE7reTN5F9TiSeA75z62VNaZUSFZz3FdWTLbC
> 
> (In reply to Adam Fontenot from comment #20)
> > ... The file, in their view, is pathological ...
> Applying a modicum of patience, running:
> 
>     nice -19 pdftotext QmVqWhPuQkE7reTN5F9TiSeA75z62VNaZUSFZz3FdWTLbC.pdf
> 
> took 37 hours on a machine with 16GB memory 8-]
> 
> The process gradually ate memory, reaching 10 GB. There wasn't an obvious
> impact on performance - but I would expect you'd see that bite when reaching
> the limits/starting to swap.

The long runtime is caused by some algorithmically bad implementation, i.e.
O(n^2) were e.g. O(n log n) is sufficient. The huge memory footprint is caused
by some problematic data arrangement and too greedy pre/overallocation.

I have filed two MRs [1],[2] for poppler, with both applied the extractions
runs in ~50 seconds on my 3 year old laptop, with a peak memory consumption of
1.8 GByte.

[1] https://gitlab.freedesktop.org/poppler/poppler/-/merge_requests/1514  
[2] https://gitlab.freedesktop.org/poppler/poppler/-/merge_requests/1515

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to