[ https://issues.apache.org/jira/browse/TIKA-4368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Allison reassigned TIKA-4368: --------------------------------- Assignee: Tim Allison > Unable to correctly extract content in OneNote > ---------------------------------------------- > > Key: TIKA-4368 > URL: https://issues.apache.org/jira/browse/TIKA-4368 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 3.0.0, 4.0.0 > Reporter: luman > Assignee: Tim Allison > Priority: Major > Attachments: Multilingual.one, Onenote-Screenshot.jpg, > Tika-gui-Screenshot.jpg > > > # Non-rich text content is not checked for the latest version, so when the > content is TextExtendedAscii, it is still parsed repeatedly. > # Time parsing does not detect the version and may extract repeatedly. > # Dates are not parsed. > # non-Ascii characters unable to correctly extract parsed. > ## Garbled text > ## No parsing performed > The attachments include the original OneNote file, a screenshot of OneNote > app, and a screenshot of TikaGUI app. -- This message was sent by Atlassian Jira (v8.20.10#820010)