[ https://issues.apache.org/jira/browse/TIKA-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17921468#comment-17921468 ]
Hudson commented on TIKA-4303: ------------------------------ SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk17 #616 (See [https://ci-builds.apache.org/job/Tika/job/tika-main-jdk17/616/]) TIKA-4303: Handle OneNotePropertyEnum.CachedTitleString as RichEditTextUnicode (#2098) (github: [https://github.com/apache/tika/commit/7f94520bfcbbf201c40da946249d912301bc58f8]) * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/test/java/org/apache/tika/parser/microsoft/onenote/OneNoteParserTest.java * (add) tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/test/resources/test-documents/test-tika-4303-Chinese-notes.one * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/onenote/OneNoteTreeWalker.java * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/onenote/OneNoteTreeWalkerOptions.java > Unable to extract Chinese content in onenote > -------------------------------------------- > > Key: TIKA-4303 > URL: https://issues.apache.org/jira/browse/TIKA-4303 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 2.8.0, 2.9.2 > Reporter: lqangi > Priority: Major > Attachments: Chinese-notes.one, tika-parsing-chinese-notes-result.png > > > When I tried to extract the contents of onenote file containing Chinese using > tika, the Chinese part of the file could not be extracted, only the > non-Chinese content was extracted. > In addition, some of the extracted content is duplicate, as described in > [TIKA-3970|https://issues.apache.org/jira/browse/TIKA-3970], it seems to > extract the historical version of the data along with the extraction, I don't > know if this issue (TIKA-3970) has been fixed (I see that the code has been > committed on github, But it doesn't seem to have completely solved the > problem yet) > The software versions I use are as follows: > Tika: 2.8.0 > Onenote: Microsoft® OneNote® LTSC MSO (16.0.14332.20761) > > In order to reproduce this problem, just use the 2.8.0 version of Tika App to > open the attachment "Chinese-Notes.one" and check whether the Chinese content > in the file is extracted. -- This message was sent by Atlassian Jira (v8.20.10#820010)