https://bugs.documentfoundation.org/show_bug.cgi?id=158329

--- Comment #13 from David Huggins-Daines <d...@ecolingui.ca> ---
(In reply to ⁨خالد حسني⁩ from comment #12)
> On top of that, ToUnicode mapping must be unique, a glyph can appear there
> only once, but fonts might map different characters to the same glyph, and
> in this case ToUnicode to be used for one of these mappings, and all the
> others will need ActualText.

Thank you for the really detailed explanation!  In this particular regression
we have a sort of ligature, so ToUnicode should work, but I understand why it
isn't sufficient in the more general case.

I'll try to do a best-effort implementation of ActualText for
pdfminer/pdfplumber, since as you say it gets used for the smallest span of
text necessary, and since text extraction is best-effort by definition anyway.

I haven't checked to see if poppler, qpdf, pdfium, and company are working on
ActualText support...

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to