https://bugs.kde.org/show_bug.cgi?id=380456
Stefan Brüns <stefan.bru...@rwth-aachen.de> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |stefan.bruens@rwth-aachen.d | |e --- Comment #22 from Stefan Brüns <stefan.bru...@rwth-aachen.de> --- (In reply to tagwerk19 from comment #21) > Created attachment 143869 [details] > pdftotext results from > https://ipfs.io/ipfs/QmVqWhPuQkE7reTN5F9TiSeA75z62VNaZUSFZz3FdWTLbC > > (In reply to Adam Fontenot from comment #20) > > ... The file, in their view, is pathological ... > Applying a modicum of patience, running: > > nice -19 pdftotext QmVqWhPuQkE7reTN5F9TiSeA75z62VNaZUSFZz3FdWTLbC.pdf > > took 37 hours on a machine with 16GB memory 8-] > > The process gradually ate memory, reaching 10 GB. There wasn't an obvious > impact on performance - but I would expect you'd see that bite when reaching > the limits/starting to swap. The long runtime is caused by some algorithmically bad implementation, i.e. O(n^2) were e.g. O(n log n) is sufficient. The huge memory footprint is caused by some problematic data arrangement and too greedy pre/overallocation. I have filed two MRs [1],[2] for poppler, with both applied the extractions runs in ~50 seconds on my 3 year old laptop, with a peak memory consumption of 1.8 GByte. [1] https://gitlab.freedesktop.org/poppler/poppler/-/merge_requests/1514 [2] https://gitlab.freedesktop.org/poppler/poppler/-/merge_requests/1515 -- You are receiving this mail because: You are watching all bug changes.