Yes please go ahead and reopen TIKA-738... sounds like something is wrong!

Thanks.

Mike McCandless

http://blog.mikemccandless.com

On Fri, Nov 25, 2011 at 9:25 PM, John M <[email protected]> wrote:
> Hello,
>
> When I use the latest build of the Tika application jar's CLI with the
> -h option to parse testAnnotations.pdf (from the parsers' test
> documents folder), added in TIKA-738, the result has two "<p>"
> elements and three "</p>" elements.  Attempting to open this file in
> the GUI also causes it to crash with a NPE--the same one described in
> TIKA-778.  I see in issue PDFBox-1143 that the code introduced for
> TIKA-738 will go away once this PDFBox issue is resolved, but perhaps
> meanwhile PDF2XHTML.java should be modified to produce a different
> number of "</p>" elements:  should one of the
> "handler.endElement("p");" lines be removed from the endPage method?
>
> Thanks,
> John Mastarone
>

Reply via email to