Sebastian Nagel created TIKA-4350: ------------------------------------- Summary: HTML snippet containing <iframe> as root element erroneously recognized as application/xml Key: TIKA-4350 URL: https://issues.apache.org/jira/browse/TIKA-4350 Project: Tika Issue Type: Bug Components: detector, mime Affects Versions: 3.0.0 Reporter: Sebastian Nagel
A HTML snippet containing an <iframe> element as document root is erroneously recognized as \{{application/xml}}. This issue was reported on the Nutch user mailing list for Nutch 1.19 using Tika 2.3.0: [https://lists.apache.org/thread/fhhp1p6y4ttxmplvz1ohk3wwjz25ozbc] The problem is reproducible with Tika 3.0.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)