https://bz.apache.org/bugzilla/show_bug.cgi?id=69172
Bug ID: 69172 Summary: PDF parse incorrect one character a line Product: POI Version: unspecified Hardware: PC Status: NEW Severity: critical Priority: P2 Component: POI Overall Assignee: dev@poi.apache.org Reporter: bighu...@hotmail.com Target Milestone: --- Created attachment 39793 --> https://bz.apache.org/bugzilla/attachment.cgi?id=39793&action=edit the incorrect result parsed by Tika and Tika Server 2.9.2 and 3.0beta The attached PDF cannot be correctly parsed by Tika 2.9.2 and 3.0beta, in server version and the standalone. If a "line break" will be added in every character. It happened to symbol, English letters, and JCK characters. In the server version, curl -g -T "sample.pdf" http://localhost:889/tika --header "Accept: text/plain" In the standalone version, java.exe -jar "C:\TikaSearch\tika-app-2.9.2.jar" --text Both of above, deliver the the incorrect result in the attached pdf. The bugs appears to some scanner models only. Other scanned documents are fine. -- You are receiving this mail because: You are the assignee for the bug. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org For additional commands, e-mail: dev-h...@poi.apache.org