Tim Allison created TIKA-4376: --------------------------------- Summary: tika-eval should tokenize on non-breaking/narrow/other space variants Key: TIKA-4376 URL: https://issues.apache.org/jira/browse/TIKA-4376 Project: Tika Issue Type: Task Components: tika-eval Reporter: Tim Allison
See TIKA-4375. Many thanks to [~tilman] for identifying this issue and supplying this link: [https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8192&number=128] -- This message was sent by Atlassian Jira (v8.20.10#820010)