[ https://issues.apache.org/jira/browse/TIKA-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Allison updated TIKA-2711: ------------------------------ Fix Version/s: (was: 2.0.0) 2.0.0-BETA > When parsing a UNIX text file apostrophes are rendered as ? > ----------------------------------------------------------- > > Key: TIKA-2711 > URL: https://issues.apache.org/jira/browse/TIKA-2711 > Project: Tika > Issue Type: Bug > Affects Versions: 1.18 > Environment: Windows 10 > Reporter: Ichbiah > Priority: Minor > Fix For: 2.0.0-BETA > > Attachments: long_text_dos.txt, long_text_unix.txt, petit_dos.txt, > petit_unix.txt > > Original Estimate: 12h > Remaining Estimate: 12h > > I have a small text file in two versions: > * a dos version of the file > * a unix version of the file > Both contain the same text below: > La politique macroéconomique cesse officiellement d’être > l’alpha et l’oméga de la lutte contre le chômage. > When I parse them using the tika-app.jar, the text is correctly "extracted" > from the DOS version of the file. For the UNIX version of the file the > apostrophes are falsely rendered as question marks. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)