Zoran Regvart created TIKA-4317: ----------------------------------- Summary: Abusive content on https://corpora.tika.apache.org/ Key: TIKA-4317 URL: https://issues.apache.org/jira/browse/TIKA-4317 Project: Tika Issue Type: Bug Components: site Reporter: Zoran Regvart
The Apache Camel team has been notified by Google of abusive content hosted on https://corpora.tika.apache.org/, with the assumption that this is somehow related to https://camel.apache.org. The scanning done by Google is against the whole apache.org domain, so implication is that any abusive content found on any domain within apache.org will be accredited and affect other domains within apache.org. Learn about abusive experiences here: https://support.google.com/webtools/answer/7347327. Singled out page from Google report (content & possibly security warning): {code}https://corpora.tika.apache.org/base/docs/commoncrawl3/QK/QKKJTNDRIVLIPP7433IFC3EF3UVOSPIB{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)