[
https://issues.apache.org/jira/browse/TIKA-4754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18086479#comment-18086479
]
ASF GitHub Bot commented on TIKA-4754:
--------------------------------------
tballison merged PR #2876:
URL: https://github.com/apache/tika/pull/2876
> Switch to bloom filters for common tokens in tika-eval
> ------------------------------------------------------
>
> Key: TIKA-4754
> URL: https://issues.apache.org/jira/browse/TIKA-4754
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Trivial
>
> This can bring the tika-eval jar from 22mb -> 8.5mb without much of a change
> in stats. We could go lower, but then there's more of a diff because of
> expected bloom filter limitations.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)