https://bugs.kde.org/show_bug.cgi?id=334068
--- Comment #7 from Jaan Vajakas <jaanvaja...@hot.ee> --- When testing with some PDF documents on my hard drive, I found that improving this bug would cause a regression for some PDFs (OCR'ed papers) from JSTOR which have slightly wrong bounding rectangles; for those documents the current rule "two glyphs belong to the same word iff their bounding box edges exactly match" works best. (An example is http://www.jstor.org/stable/1970717 but unfortunately they want money for downloading the PDF unless you belong to a university that has a contract with them.) However, those JSTOR PDFs are Tagged PDFs and their Tagged PDF actual text content (which can be obtained by copying text from Acrobat Reader) is good. So, in order to avoid regressions, Tagged PDF support (i. e., not doing layout detection for Tagged PDFs) should also be added to Okular when fixing this bug. However, I didn't find a method returning the Tagged PDF actual text in the Qt4 interface of poppler. The only promising one was Poppler::Page::textList(), which is also currently used by Okular (but Okular does some layout detection chemistry on top of it) but from testing with poppler (0.26.0 and 0.26.1), but I found that textList() still doesn't return the Tagged PDF text but the results of layout detection done by poppler. -- You are receiving this mail because: You are the assignee for the bug. _______________________________________________ Okular-devel mailing list Okular-devel@kde.org https://mail.kde.org/mailman/listinfo/okular-devel