This is an automated email from the ASF dual-hosted git repository.
krickert pushed a change to branch OPENNLP-1850-4-docs
in repository https://gitbox.apache.org/repos/asf/opennlp.git
discard e0011e207 OPENNLP-1850 Docs review nits: populate ids2Labels example;
rule-based hyphenation
discard ce01cc0b1 OPENNLP-1850 Mirror the Extended_Pictographic emoji caveat
in the tokenizer manual
discard f758d8ef6 OPENNLP-1850 Tighten normalizer manual wording (review nits)
discard 2e91c1d48 OPENNLP-1850 Document the supplementary-dash offset shift in
the DL fold options
discard 556adc35d OPENNLP-1850 Document the offset-aware substitution folds
(quotes, digits, ellipsis, bullets, umlaut)
discard 00f4a3a35 OPENNLP-1850 Name the OffsetMappingNameFinder capability
interface in the manual
discard ef6a3c1da OPENNLP-1850 Document the offset-aware normalization
pipeline (buildAligned)
discard 2cf733491 OPENNLP-1850 Document Unicode normalization, the UAX #29
tokenizer, and DL handling
discard b6dc2418f OPENNLP-1850 Fully-qualify TokenNameFinder javadoc links in
NameFinderDL
discard 062322e22 OPENNLP-1850 Fail loud on corrupt document-classification
model output
discard 7e1c4e55d OPENNLP-1850 Fail fast on null finder input; fix the GPU
eval test options
discard f3dc9b992 OPENNLP-1850 Harden fail-loud paths in the DL components
discard 284afa576 OPENNLP-1850 Add real-model chunk-boundary eval tests; drop
dead label constants
discard 702d392c8 OPENNLP-1850 Resolve overlapping chunk spans and compose the
input alignment
discard aff3fd44f OPENNLP-1850 Add OffsetMappingNameFinder capability
interface and a findInOriginal end-to-end test
discard e945009ea OPENNLP-1850 Offset-safe, Unicode-aware input normalization
in the DL components
discard b6cd17380 OPENNLP-1850 Per-language NormalizationProfile registry (2c)
discard 57e2b5833 OPENNLP-1850 Layered Term model: Term, TermAnalyzer (2b)
discard a450069ef OPENNLP-1850 UAX #29 word tokenizer: WordSegmenter,
WordTokenizer, WordType (2a)
omit 08de0d35c OPENNLP-1850 Offset/alignment layer: Alignment, AlignedText,
buildAligned, *Aligned (1b)
add 1a6b27387 OPENNLP-1850 Fail loud on a structurally-malformed
confusables line (engine)
add 9af6d92bd OPENNLP-1850 Offset/alignment layer: Alignment, AlignedText,
buildAligned, *Aligned (1b)
add e51758922 OPENNLP-1850 UAX #29 word tokenizer: WordSegmenter,
WordTokenizer, WordType (2a)
add dc02b9e85 OPENNLP-1850 Fail loud on a Word_Break line missing its ';'
(tokenizer)
add 82cb041e8 OPENNLP-1850 Layered Term model: Term, TermAnalyzer (2b)
add be8d26e31 OPENNLP-1850 Per-language NormalizationProfile registry (2c)
add a345f48b2 OPENNLP-1850 Resolve Norwegian nb/nn to the Norwegian
profile (profiles)
add 3c6acd4e1 OPENNLP-1850 Offset-safe, Unicode-aware input normalization
in the DL components
add 85d45f13f OPENNLP-1850 Add OffsetMappingNameFinder capability
interface and a findInOriginal end-to-end test
add 8c206cadc OPENNLP-1850 Resolve overlapping chunk spans and compose the
input alignment
add 46cac9839 OPENNLP-1850 Add real-model chunk-boundary eval tests; drop
dead label constants
add 264b20ac8 OPENNLP-1850 Harden fail-loud paths in the DL components
add fd9479f18 OPENNLP-1850 Fail fast on null finder input; fix the GPU
eval test options
add 4dc78ffad OPENNLP-1850 Fail loud on corrupt document-classification
model output
add 10d43cf7b OPENNLP-1850 Fully-qualify TokenNameFinder javadoc links in
NameFinderDL
add be98bd611 OPENNLP-1850 Reject non-finite logits in softmax, not just
NaN (dl)
add 579845327 OPENNLP-1850 Document Unicode normalization, the UAX #29
tokenizer, and DL handling
add f313ee5bb OPENNLP-1850 Document the offset-aware normalization
pipeline (buildAligned)
add 4a1c2e889 OPENNLP-1850 Name the OffsetMappingNameFinder capability
interface in the manual
add c3246e5b5 OPENNLP-1850 Document the offset-aware substitution folds
(quotes, digits, ellipsis, bullets, umlaut)
add 9048af913 OPENNLP-1850 Document the supplementary-dash offset shift in
the DL fold options
add 206934db5 OPENNLP-1850 Tighten normalizer manual wording (review nits)
add 09b67af4d OPENNLP-1850 Mirror the Extended_Pictographic emoji caveat
in the tokenizer manual
add 4a48643dd OPENNLP-1850 Docs review nits: populate ids2Labels example;
rule-based hyphenation
This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version. This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:
* -- * -- B -- O -- O -- O (e0011e207)
\
N -- N -- N refs/heads/OPENNLP-1850-4-docs (4a48643dd)
You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.
Any revisions marked "omit" are not gone; other references still
refer to them. Any revisions marked "discard" are gone forever.
No new revisions were added by this update.
Summary of changes:
.../opennlp/dl/doccat/DocumentCategorizerDL.java | 8 +-
.../dl/doccat/DocumentCategorizerDLTest.java | 13 +++
.../tools/tokenize/uax29/WordBreakProperty.java | 9 +-
.../opennlp/tools/util/normalizer/Confusables.java | 98 ++++++++++++----------
.../util/normalizer/NormalizationProfiles.java | 4 +-
.../tokenize/uax29/WordBreakPropertyTest.java | 16 ++++
.../tools/util/normalizer/ConfusablesLoadTest.java | 53 ++++++++++++
.../util/normalizer/NormalizationProfilesTest.java | 19 ++++-
8 files changed, 169 insertions(+), 51 deletions(-)
create mode 100644
opennlp-core/opennlp-runtime/src/test/java/opennlp/tools/util/normalizer/ConfusablesLoadTest.java