https://bugs.kde.org/show_bug.cgi?id=334068
--- Comment #4 from Jaan Vajakas <jaanvaja...@hot.ee> --- The problem with this file is that the bounding boxes of "T" and "A" overlap and Okular's layout detection algorithm only considers two glyphs to belong to the same word if the second one's bounding box touches the first one's right side exactly (rounded to integer pixels at a certain resolution), not if there is overlap or a gap. I think I can write a small patch to solve it: accept overlap (or maybe also gap) within a percentage of the width of the following character. In the long run, as layout detection is something that will never be 100% perfect and in particular the XY Cut layout detection approach that Okular uses has some fundamental limitations, I think the layout detection in Okular would benefit from a major refactoring to 1) use existing text flow info in the file if available (Tagged PDF, ePUB, OpenDocument etc.) and 2) for files where text flow data is really missing, reuse algorithms from other similar projects to save the research & development effort. For the current file, however, 1) would not help since it is not a Tagged PDF, i. e. it is one of the kind that Albert described in his comment. -- You are receiving this mail because: You are the assignee for the bug. _______________________________________________ Okular-devel mailing list Okular-devel@kde.org https://mail.kde.org/mailman/listinfo/okular-devel