[ https://issues.apache.org/jira/browse/TIKA-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17132744#comment-17132744 ]
Tim Allison commented on TIKA-3109: ----------------------------------- Wait, we already are... https://github.com/apache/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/html/HtmlHandler.java#L181 This should be fairly straightforward. I can probably take this next week unless someone wants to beat me to it. > Ingest attachment: failed to extract text from iframe > ----------------------------------------------------- > > Key: TIKA-3109 > URL: https://issues.apache.org/jira/browse/TIKA-3109 > Project: Tika > Issue Type: Bug > Affects Versions: 1.22 > Environment: * Apache Tika 1.22 > * {{Java}} > {{java 13.0.2 2020-01-14}} > * {{Ubuntu 18.04.1 LTS}} > {{Linux XXXXX 4.15.0-101-generic #102-Ubuntu SMP Mon May 11 10:07:26 UTC 2020 > x86_64 x86_64 x86_64 GNU/Linux}} > Reporter: Younes > Priority: Major > > This standalone > [HTML|https://github.com/elastic/elasticsearch/files/4757855/c0711285-8ab7-46c3-b730-7c0639466537.html.zip] > page has all its CSS/JS/IMAGEs embedded. > After indexing it using ElasticSearch, we tried to search the keyword > *logarithmic* which exists. Unfortunately, we couldn't find it. > [~dadoonet] was able to reproduce the issue which is fully described > [elasticsearch|https://github.com/elastic/elasticsearch/issues/57924] -- This message was sent by Atlassian Jira (v8.3.4#803005)