[Bug 69172] New: PDF parse incorrect one character a line

bugzilla Sun, 07 Jul 2024 22:06:50 -0700

https://bz.apache.org/bugzilla/show_bug.cgi?id=69172


            Bug ID: 69172
           Summary: PDF parse incorrect one character a line
           Product: POI
           Version: unspecified
          Hardware: PC
            Status: NEW
          Severity: critical
          Priority: P2
         Component: POI Overall
          Assignee: [email protected]
          Reporter: [email protected]
  Target Milestone: ---

Created attachment 39793
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=39793&action=edit
the incorrect result parsed by Tika and Tika Server 2.9.2 and 3.0beta

The attached PDF cannot be correctly parsed by Tika 2.9.2 and 3.0beta, in
server version and the standalone. 

If a "line break" will be added in every character. It happened to symbol,
English letters, and JCK characters. 

In the server version, curl -g -T "sample.pdf" http://localhost:889/tika
--header "Accept: text/plain"

In the standalone version, java.exe -jar "C:\TikaSearch\tika-app-2.9.2.jar"
--text

Both of above, deliver the the incorrect result in the attached pdf. 

The bugs appears to some scanner models only. Other scanned documents are fine.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[Bug 69172] New: PDF parse incorrect one character a line

Reply via email to