https://bz.apache.org/bugzilla/show_bug.cgi?id=69172

            Bug ID: 69172
           Summary: PDF parse incorrect one character a line
           Product: POI
           Version: unspecified
          Hardware: PC
            Status: NEW
          Severity: critical
          Priority: P2
         Component: POI Overall
          Assignee: dev@poi.apache.org
          Reporter: bighu...@hotmail.com
  Target Milestone: ---

Created attachment 39793
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=39793&action=edit
the incorrect result parsed by Tika and Tika Server 2.9.2 and 3.0beta

The attached PDF cannot be correctly parsed by Tika 2.9.2 and 3.0beta, in
server version and the standalone. 

If a "line break" will be added in every character. It happened to symbol,
English letters, and JCK characters. 

In the server version, curl -g -T "sample.pdf" http://localhost:889/tika
--header "Accept: text/plain"

In the standalone version, java.exe -jar "C:\TikaSearch\tika-app-2.9.2.jar"
--text

Both of above, deliver the the incorrect result in the attached pdf. 

The bugs appears to some scanner models only. Other scanned documents are fine.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
For additional commands, e-mail: dev-h...@poi.apache.org

Reply via email to