Rotated text isn't extracted correctly from PDFs
------------------------------------------------
Key: TIKA-723
URL: https://issues.apache.org/jira/browse/TIKA-723
Project: Tika
Issue Type: Bug
Components: parser
Reporter: Michael McCandless
Priority: Minor
Attachments: rotated.pdf
I have an example PDF with 90 degree rotation; Tika produces the
characters one line at a time. Ie, the doc has "Some rotated text,
here!" but Tika produces this:
{noformat}
<body><div class="page"><p>So
m
e
r
o
t
a
t
e
d
t
e
x
t
,
h
e
r
e
!</p>
{noformat}
I'm able to copy/paste the text out correctly.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira