Yes please go ahead and reopen TIKA-738... sounds like something is wrong! Thanks.
Mike McCandless http://blog.mikemccandless.com On Fri, Nov 25, 2011 at 9:25 PM, John M <[email protected]> wrote: > Hello, > > When I use the latest build of the Tika application jar's CLI with the > -h option to parse testAnnotations.pdf (from the parsers' test > documents folder), added in TIKA-738, the result has two "<p>" > elements and three "</p>" elements. Attempting to open this file in > the GUI also causes it to crash with a NPE--the same one described in > TIKA-778. I see in issue PDFBox-1143 that the code introduced for > TIKA-738 will go away once this PDFBox issue is resolved, but perhaps > meanwhile PDF2XHTML.java should be modified to produce a different > number of "</p>" elements: should one of the > "handler.endElement("p");" lines be removed from the endPage method? > > Thanks, > John Mastarone >
