[ https://issues.apache.org/jira/browse/TIKA-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16372077#comment-16372077 ]
Hudson commented on TIKA-2563: ------------------------------ SUCCESS: Integrated in Jenkins build Tika-trunk #1436 (See [https://builds.apache.org/job/Tika-trunk/1436/]) TIKA-2563 -- Extract files embedded in HTML and javascript inside HTML (tallison: [https://github.com/apache/tika/commit/9ec8b43c269a79fe065c97eca66525a40eea3a41]) * (add) tika-parsers/src/main/java/org/apache/tika/parser/utils/DataURISchemeUtil.java * (add) tika-parsers/src/main/java/org/apache/tika/parser/utils/DataURISchemeParseException.java * (add) tika-parsers/src/main/java/org/apache/tika/parser/utils/DataURIScheme.java * (add) tika-parsers/src/test/resources/test-documents/testHTML_embedded_img_in_js.html * (add) tika-parsers/src/test/java/org/apache/tika/parser/utils/DataURISchemeParserTest.java * (edit) CHANGES.txt * (edit) tika-parsers/src/main/java/org/apache/tika/parser/html/HtmlHandler.java * (add) tika-parsers/src/test/resources/test-documents/testHTML_embedded_img.html * (edit) tika-parsers/src/test/java/org/apache/tika/parser/html/HtmlParserTest.java > Extract embedded objects in HTML and javascript > ----------------------------------------------- > > Key: TIKA-2563 > URL: https://issues.apache.org/jira/browse/TIKA-2563 > Project: Tika > Issue Type: Improvement > Reporter: Tim Allison > Assignee: Tim Allison > Priority: Trivial > Fix For: 1.18, 2.0.0 > > Attachments: consumentenbond.html, testHTML_embedded_img.html > > > Files (esp images) and other objects can be embedded in html/css/javascript > with the [data: uri scheme|https://en.wikipedia.org/wiki/Data_URI_scheme]. > We should extract those like any other embedded file. -- This message was sent by Atlassian JIRA (v7.6.3#76005)