[jira] [Created] (TIKA-1289) Ligatures convert on text extraction

Alex Andrushchak (JIRA) Fri, 02 May 2014 08:02:25 -0700

Alex Andrushchak created TIKA-1289:
--------------------------------------

             Summary: Ligatures convert on text extraction
                 Key: TIKA-1289
                 URL: https://issues.apache.org/jira/browse/TIKA-1289
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.5
         Environment: win 8, jre 1.5
            Reporter: Alex Andrushchak



According to tika sources review, it uses pdfbox to parse pdf files. 
I found that pdfbox itself uses icu4j to handle ligatures.
Unfortunately, when i added icu4j jar to my classpath nothing changed, 
ligatures are still not converted. Sample pdf file is attached.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (TIKA-1289) Ligatures convert on text extraction

Reply via email to