(opennlp) branch OPENNLP-1850-4-docs updated (4a48643dd -> 0bb8b2da2)

kristian Wed, 24 Jun 2026 04:55:25 -0700

This is an automated email from the ASF dual-hosted git repository.

krickert pushed a change to branch OPENNLP-1850-4-docs
in repository https://gitbox.apache.org/repos/asf/opennlp.git



 discard 4a48643dd OPENNLP-1850 Docs review nits: populate ids2Labels example; 
rule-based hyphenation
 discard 09b67af4d OPENNLP-1850 Mirror the Extended_Pictographic emoji caveat 
in the tokenizer manual
 discard 206934db5 OPENNLP-1850 Tighten normalizer manual wording (review nits)
 discard 9048af913 OPENNLP-1850 Document the supplementary-dash offset shift in 
the DL fold options
 discard c3246e5b5 OPENNLP-1850 Document the offset-aware substitution folds 
(quotes, digits, ellipsis, bullets, umlaut)
 discard 4a1c2e889 OPENNLP-1850 Name the OffsetMappingNameFinder capability 
interface in the manual
 discard f313ee5bb OPENNLP-1850 Document the offset-aware normalization 
pipeline (buildAligned)
 discard 579845327 OPENNLP-1850 Document Unicode normalization, the UAX #29 
tokenizer, and DL handling
 discard be98bd611 OPENNLP-1850 Reject non-finite logits in softmax, not just 
NaN (dl)
 discard 10d43cf7b OPENNLP-1850 Fully-qualify TokenNameFinder javadoc links in 
NameFinderDL
 discard 4dc78ffad OPENNLP-1850 Fail loud on corrupt document-classification 
model output
 discard fd9479f18 OPENNLP-1850 Fail fast on null finder input; fix the GPU 
eval test options
 discard 264b20ac8 OPENNLP-1850 Harden fail-loud paths in the DL components
 discard 46cac9839 OPENNLP-1850 Add real-model chunk-boundary eval tests; drop 
dead label constants
 discard 8c206cadc OPENNLP-1850 Resolve overlapping chunk spans and compose the 
input alignment
 discard 85d45f13f OPENNLP-1850 Add OffsetMappingNameFinder capability 
interface and a findInOriginal end-to-end test
 discard 3c6acd4e1 OPENNLP-1850 Offset-safe, Unicode-aware input normalization 
in the DL components
 discard a345f48b2 OPENNLP-1850 Resolve Norwegian nb/nn to the Norwegian 
profile (profiles)
 discard be8d26e31 OPENNLP-1850 Per-language NormalizationProfile registry (2c)
 discard 82cb041e8 OPENNLP-1850 Layered Term model: Term, TermAnalyzer (2b)
 discard dc02b9e85 OPENNLP-1850 Fail loud on a Word_Break line missing its ';' 
(tokenizer)
 discard e51758922 OPENNLP-1850 UAX #29 word tokenizer: WordSegmenter, 
WordTokenizer, WordType (2a)
 discard 9af6d92bd OPENNLP-1850 Offset/alignment layer: Alignment, AlignedText, 
buildAligned, *Aligned (1b)
     add 69f513616 OPENNLP-1850 Dedup expanding folds via CharClass.substitute; 
share a Lazy holder (engine)
     add 9dc7d5106 OPENNLP-1850 Offset/alignment layer: Alignment, AlignedText, 
buildAligned, *Aligned (1b)
     add aa3522eb9 OPENNLP-1850 UAX #29 word tokenizer: WordSegmenter, 
WordTokenizer, WordType (2a)
     add a46b2f3bb OPENNLP-1850 Fail loud on a Word_Break line missing its ';' 
(tokenizer)
     add dd1906d0e OPENNLP-1850 Route WordBreakProperty/ExtendedPictographic 
through the shared Lazy holder
     add e35e8591d OPENNLP-1850 Layered Term model: Term, TermAnalyzer (2b)
     add 9cac6484e OPENNLP-1850 Per-language NormalizationProfile registry (2c)
     add 93618ce40 OPENNLP-1850 Resolve Norwegian nb/nn to the Norwegian 
profile (profiles)
     add 349399083 OPENNLP-1850 Offset-safe, Unicode-aware input normalization 
in the DL components
     add 6c9a8f192 OPENNLP-1850 Add OffsetMappingNameFinder capability 
interface and a findInOriginal end-to-end test
     add 004073926 OPENNLP-1850 Resolve overlapping chunk spans and compose the 
input alignment
     add 61e0100a2 OPENNLP-1850 Add real-model chunk-boundary eval tests; drop 
dead label constants
     add ba6c5dbc1 OPENNLP-1850 Harden fail-loud paths in the DL components
     add 330325910 OPENNLP-1850 Fail fast on null finder input; fix the GPU 
eval test options
     add f7f26d376 OPENNLP-1850 Fail loud on corrupt document-classification 
model output
     add e10be7d9b OPENNLP-1850 Fully-qualify TokenNameFinder javadoc links in 
NameFinderDL
     add d4fec508f OPENNLP-1850 Reject non-finite logits in softmax, not just 
NaN (dl)
     add 5a1114c0d OPENNLP-1850 Make mergeOverlappingSpans O(n log n) (dl)
     add 9bf85e75b OPENNLP-1850 Document Unicode normalization, the UAX #29 
tokenizer, and DL handling
     add 25c027f81 OPENNLP-1850 Document the offset-aware normalization 
pipeline (buildAligned)
     add b6960500f OPENNLP-1850 Name the OffsetMappingNameFinder capability 
interface in the manual
     add 7ab13e543 OPENNLP-1850 Document the offset-aware substitution folds 
(quotes, digits, ellipsis, bullets, umlaut)
     add ec9fc7634 OPENNLP-1850 Document the supplementary-dash offset shift in 
the DL fold options
     add f24edeb0c OPENNLP-1850 Tighten normalizer manual wording (review nits)
     add 663710ca1 OPENNLP-1850 Mirror the Extended_Pictographic emoji caveat 
in the tokenizer manual
     add 0bb8b2da2 OPENNLP-1850 Docs review nits: populate ids2Labels example; 
rule-based hyphenation

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (4a48643dd)
            \
             N -- N -- N   refs/heads/OPENNLP-1850-4-docs (0bb8b2da2)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

No new revisions were added by this update.

Summary of changes:
 .../opennlp/tools/util/normalizer/CharClass.java   | 61 ++++++++++++++++++++
 .../java/opennlp/dl/namefinder/NameFinderDL.java   | 26 +++++----
 .../tools/tokenize/uax29/ExtendedPictographic.java | 28 +++-------
 .../tools/tokenize/uax29/WordBreakProperty.java    | 30 +++-------
 .../src/main/java/opennlp/tools/util/Lazy.java     | 65 ++++++++++++++++++++++
 .../opennlp/tools/util/normalizer/Confusables.java | 31 +++--------
 .../normalizer/DigitCharSequenceNormalizer.java    | 41 +++-----------
 .../normalizer/EllipsisCharSequenceNormalizer.java | 34 +----------
 .../GermanUmlautCharSequenceNormalizer.java        | 37 +++---------
 9 files changed, 181 insertions(+), 172 deletions(-)
 create mode 100644 
opennlp-core/opennlp-runtime/src/main/java/opennlp/tools/util/Lazy.java

(opennlp) branch OPENNLP-1850-4-docs updated (4a48643dd -> 0bb8b2da2)

Reply via email to