[ 
https://issues.apache.org/jira/browse/TIKA-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17912892#comment-17912892
 ] 

ASF GitHub Bot commented on TIKA-4303:
--------------------------------------

tballison commented on PR #2098:
URL: https://github.com/apache/tika/pull/2098#issuecomment-2589895087

   Thank you for opening this and looking deeply into the code. Is there any 
way to create a unit test for this issue? Do any of our current unit test files 
show what this fixes? This is not required...we do what we can.
   
   Thank you, again, and y, fixes to the other issues would be great.




> Unable to extract Chinese content in onenote
> --------------------------------------------
>
>                 Key: TIKA-4303
>                 URL: https://issues.apache.org/jira/browse/TIKA-4303
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 2.8.0, 2.9.2
>            Reporter: lqangi
>            Priority: Major
>         Attachments: Chinese-notes.one, tika-parsing-chinese-notes-result.png
>
>
> When I tried to extract the contents of onenote file containing Chinese using 
> tika, the Chinese part of the file could not be extracted, only the 
> non-Chinese content was extracted.
> In addition, some of the extracted content is duplicate, as described in 
> [TIKA-3970|https://issues.apache.org/jira/browse/TIKA-3970], it seems to 
> extract the historical version of the data along with the extraction, I don't 
> know if this issue (TIKA-3970) has been fixed (I see that the code has been 
> committed on github, But it doesn't seem to have completely solved the 
> problem yet)
> The software versions I use are as follows:
> Tika: 2.8.0
> Onenote: Microsoft® OneNote® LTSC MSO (16.0.14332.20761)
>  
> In order to reproduce this problem, just use the 2.8.0 version of Tika App to 
> open the attachment "Chinese-Notes.one" and check whether the Chinese content 
> in the file is extracted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to