Gordon Vidal created TIKA-3828:
----------------------------------

             Summary: OneNote Parser - Parsed Files are Missing Parts of the 
Content
                 Key: TIKA-3828
                 URL: https://issues.apache.org/jira/browse/TIKA-3828
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.28.4, 2.4.1
            Reporter: Gordon Vidal
         Attachments: TestSection1 (1).one, TikaParserErrorScreenshot.png

OneNote files that I receive from Sharepoint Online are currently not parsed 
correctly. See the attached screenshot and OneNote section file.

I have been able to consistently reproduce this issue doing the following:
 * Create a OneNote Document with multiple sections.  
 * Edit the OneNote Document using the option "Open in Desktop App" and make 
changes in different sections, saving between edits. I have used both OneNote 
2016 (Version 1808) and OneNote 2021 (Version 2108).
 * Download a section of the OneNote Document using the Sharepoint Online REST 
API

I will be investigating this issue myself as well. The Tika codebase is quite 
new to me so any information about the status of this bug, the potential cause 
and any plans to fix it would be very welcome. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to