[ https://issues.apache.org/jira/browse/TIKA-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16671452#comment-16671452 ]
Markus Jelsma commented on TIKA-2760: ------------------------------------- Hello [~davemeikle], Of course! I cannot understand why i did not see this, i am so sorry to have bothered anybody with this nuisance. My apologies, Markus > LinkContentHandler does not report hyperlinks > --------------------------------------------- > > Key: TIKA-2760 > URL: https://issues.apache.org/jira/browse/TIKA-2760 > Project: Tika > Issue Type: Bug > Affects Versions: 1.19 > Reporter: Markus Jelsma > Priority: Major > Fix For: 1.20 > > Attachments: TIKA-2760 - Test for Outlinks.diff, TIKA-2760.patch, > ronaldmcdonald-nolinks.html > > > Nutch uses LinkContentHandler for collection hyperlinks, and does not report > any hyperlink for http://www.ronaldmcdonaldhouse.co.uk/ which i'll also > attach to this ticket. > Debugging LinkContentHandler to print element names in startElement reveals > only very few HTML elements get reported, which i think is incorrect. > Our own parser in Nutch uses a custom ContentHandler and does report many > elements, including hyperlinks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)