GitHub user rzo1 added a comment to the discussion: Don't index sitemap inopensearch
The sitemap parser bolt will set a metadata key [`isSitemap`](https://stormcrawler.apache.org/docs/3.5.1/index.html#_sitemapparserbolt) and use filter keys in the indexer's configuration to remove them from indexing: [`indexer.md.filter`](https://stormcrawler.apache.org/docs/3.5.1/index.html#_indexing) GitHub link: https://github.com/apache/stormcrawler/discussions/1803#discussioncomment-15679470 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
