(opennlp) branch OPENNLP-1850-4-docs updated (c0fb2ebcb -> 58bbde6be)

kristian Thu, 25 Jun 2026 10:25:17 -0700

This is an automated email from the ASF dual-hosted git repository.

krickert pushed a change to branch OPENNLP-1850-4-docs
in repository https://gitbox.apache.org/repos/asf/opennlp.git



 discard c0fb2ebcb OPENNLP-1850 Docs review nits: populate ids2Labels example; 
rule-based hyphenation
 discard e5b4d887f OPENNLP-1850 Mirror the Extended_Pictographic emoji caveat 
in the tokenizer manual
 discard 783dc4408 OPENNLP-1850 Tighten normalizer manual wording (review nits)
 discard f2d6d0e30 OPENNLP-1850 Document the supplementary-dash offset shift in 
the DL fold options
 discard 8bb30116c OPENNLP-1850 Document the offset-aware substitution folds 
(quotes, digits, ellipsis, bullets, umlaut)
 discard d514e0a74 OPENNLP-1850 Name the OffsetMappingNameFinder capability 
interface in the manual
 discard 0c3b7f01d OPENNLP-1850 Document the offset-aware normalization 
pipeline (buildAligned)
 discard 7640014db OPENNLP-1850 Document Unicode normalization, the UAX #29 
tokenizer, and DL handling
 discard 57ceefdb2 OPENNLP-1850 Make mergeOverlappingSpans O(n log n) (dl)
 discard 0bd4f058f OPENNLP-1850 Reject non-finite logits in softmax, not just 
NaN (dl)
 discard 2e4e04175 OPENNLP-1850 Fully-qualify TokenNameFinder javadoc links in 
NameFinderDL
 discard 22e9314ee OPENNLP-1850 Fail loud on corrupt document-classification 
model output
 discard d7d3f2813 OPENNLP-1850 Fail fast on null finder input; fix the GPU 
eval test options
 discard 553b89acf OPENNLP-1850 Harden fail-loud paths in the DL components
 discard 808535466 OPENNLP-1850 Add real-model chunk-boundary eval tests; drop 
dead label constants
 discard dd9b495a3 OPENNLP-1850 Resolve overlapping chunk spans and compose the 
input alignment
 discard cb99cd4e5 OPENNLP-1850 Add OffsetMappingNameFinder capability 
interface and a findInOriginal end-to-end test
 discard da0d84fd0 OPENNLP-1850 Offset-safe, Unicode-aware input normalization 
in the DL components
 discard 70ec7a3df OPENNLP-1850 Resolve Norwegian nb/nn to the Norwegian 
profile (profiles)
 discard f600dfd55 OPENNLP-1850 Per-language NormalizationProfile registry (2c)
 discard 55dbeb4b2 OPENNLP-1850 Layered Term model: Term, TermAnalyzer (2b)
 discard 3fae8aad6 OPENNLP-1850 Fail loud on a Word_Break line missing its ';' 
(tokenizer)
 discard 47480171c OPENNLP-1850 UAX #29 word tokenizer: WordSegmenter, 
WordTokenizer, WordType (2a)
 discard b24c9ee3d OPENNLP-1850 Offset/alignment layer: Alignment, AlignedText, 
buildAligned, *Aligned (1b)
     add 124d8526e OPENNLP-1850 Review nits: align Confusables to 
IllegalArgumentException; pom newline (engine)
     add 1d8f582c0 OPENNLP-1850 Offset/alignment layer: Alignment, AlignedText, 
buildAligned, *Aligned (1b)
     add 702acc52f OPENNLP-1850 Review nits: soften DL forward-link; fix 
LineBreakPreserving opener (alignment)
     add c19c4fc11 OPENNLP-1850 UAX #29 word tokenizer: WordSegmenter, 
WordTokenizer, WordType (2a)
     add 57b77648e OPENNLP-1850 Fail loud on a Word_Break line missing its ';' 
(tokenizer)
     add f2d1d8cca OPENNLP-1850 Review nits: ExtendedPictographic fail-loud 
parity + doc; WordType heuristic note (tokenizer)
     add 58cff0120 OPENNLP-1850 Layered Term model: Term, TermAnalyzer (2b)
     add a23a51358 OPENNLP-1850 Review nits: rename dashes()->dash(); LEMMA 
doc+test; soften forward-link (Term)
     add 8d32dbac9 OPENNLP-1850 Per-language NormalizationProfile registry (2c)
     add 859146c9c OPENNLP-1850 Resolve Norwegian nb/nn to the Norwegian 
profile (profiles)
     add 13e46418b OPENNLP-1850 Review nits: add Turkish profile; derive 
coverage from the enum (profiles)
     add 0ec30f676 OPENNLP-1850 Offset-safe, Unicode-aware input normalization 
in the DL components
     add 9d28db088 OPENNLP-1850 Add OffsetMappingNameFinder capability 
interface and a findInOriginal end-to-end test
     add 76003e15a OPENNLP-1850 Resolve overlapping chunk spans and compose the 
input alignment
     add a4444192e OPENNLP-1850 Add real-model chunk-boundary eval tests; drop 
dead label constants
     add 4e4713263 OPENNLP-1850 Harden fail-loud paths in the DL components
     add 498539b5c OPENNLP-1850 Fail fast on null finder input; fix the GPU 
eval test options
     add 74f3c6425 OPENNLP-1850 Fail loud on corrupt document-classification 
model output
     add ee0e294fc OPENNLP-1850 Fully-qualify TokenNameFinder javadoc links in 
NameFinderDL
     add c799b80bf OPENNLP-1850 Reject non-finite logits in softmax, not just 
NaN (dl)
     add b9d6972bc OPENNLP-1850 Make mergeOverlappingSpans O(n log n) (dl)
     add 2f4a5ab6f OPENNLP-1850 Review nits: extract testable DL guards; 
merge-copy; capitalize msgs; migration note
     add acf5077c9 OPENNLP-1850 Document Unicode normalization, the UAX #29 
tokenizer, and DL handling
     add 6c5f00a8a OPENNLP-1850 Document the offset-aware normalization 
pipeline (buildAligned)
     add 304799432 OPENNLP-1850 Name the OffsetMappingNameFinder capability 
interface in the manual
     add 7b572a817 OPENNLP-1850 Document the offset-aware substitution folds 
(quotes, digits, ellipsis, bullets, umlaut)
     add 32c4dee36 OPENNLP-1850 Document the supplementary-dash offset shift in 
the DL fold options
     add 9c1376325 OPENNLP-1850 Tighten normalizer manual wording (review nits)
     add 3ebd18dbd OPENNLP-1850 Mirror the Extended_Pictographic emoji caveat 
in the tokenizer manual
     add f8ed41fd5 OPENNLP-1850 Docs review nits: populate ids2Labels example; 
rule-based hyphenation
     add 58bbde6be OPENNLP-1850 Docs review nits: declare xmlns:xlink; populate 
second ids2Labels example

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (c0fb2ebcb)
            \
             N -- N -- N   refs/heads/OPENNLP-1850-4-docs (58bbde6be)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

No new revisions were added by this update.

Summary of changes:
 opennlp-api/pom.xml                                |  2 +-
 .../util/normalizer/OffsetAwareNormalizer.java     |  2 +-
 opennlp-core/opennlp-ml/opennlp-dl/README.md       | 16 ++++++++++++
 .../opennlp/dl/doccat/DocumentCategorizerDL.java   | 19 +++++++++++---
 .../java/opennlp/dl/namefinder/NameFinderDL.java   |  4 ++-
 .../dl/doccat/DocumentCategorizerDLTest.java       | 30 ++++++++++++++++++++++
 .../opennlp/dl/namefinder/NameFinderDLTest.java    | 12 +++++++++
 .../tools/tokenize/uax29/ExtendedPictographic.java | 26 +++++++++++++------
 .../opennlp/tools/tokenize/uax29/WordType.java     |  4 ++-
 .../opennlp/tools/util/normalizer/Confusables.java |  4 +--
 ...PreservingWhitespaceCharSequenceNormalizer.java |  2 +-
 .../util/normalizer/NormalizationProfiles.java     |  9 ++++++-
 .../tools/util/normalizer/TermAnalyzer.java        | 11 ++++----
 .../tokenize/uax29/ExtendedPictographicTest.java   | 16 ++++++++++++
 .../tools/util/normalizer/ConfusablesLoadTest.java |  2 +-
 .../util/normalizer/NormalizationProfilesTest.java | 25 +++++++++++++++---
 .../tools/util/normalizer/TermAnalyzerTest.java    | 23 +++++++++++++++++
 opennlp-docs/src/docbkx/namefinder.xml             |  9 +++++++
 opennlp-docs/src/docbkx/normalizer.xml             |  2 +-
 opennlp-docs/src/docbkx/tokenizer.xml              |  2 +-
 20 files changed, 188 insertions(+), 32 deletions(-)

(opennlp) branch OPENNLP-1850-4-docs updated (c0fb2ebcb -> 58bbde6be)

Reply via email to