https://bugs.documentfoundation.org/show_bug.cgi?id=158329
--- Comment #13 from David Huggins-Daines <d...@ecolingui.ca> --- (In reply to خالد حسني from comment #12) > On top of that, ToUnicode mapping must be unique, a glyph can appear there > only once, but fonts might map different characters to the same glyph, and > in this case ToUnicode to be used for one of these mappings, and all the > others will need ActualText. Thank you for the really detailed explanation! In this particular regression we have a sort of ligature, so ToUnicode should work, but I understand why it isn't sufficient in the more general case. I'll try to do a best-effort implementation of ActualText for pdfminer/pdfplumber, since as you say it gets used for the smallest span of text necessary, and since text extraction is best-effort by definition anyway. I haven't checked to see if poppler, qpdf, pdfium, and company are working on ActualText support... -- You are receiving this mail because: You are the assignee for the bug.