(opennlp) branch OPENNLP-1850-4-docs updated (d7838a590 -> f75627bf1)

kristian Tue, 23 Jun 2026 06:14:55 -0700

This is an automated email from the ASF dual-hosted git repository.

krickert pushed a change to branch OPENNLP-1850-4-docs
in repository https://gitbox.apache.org/repos/asf/opennlp.git



 discard d7838a590 OPENNLP-1850 Mirror the Extended_Pictographic emoji caveat 
in the tokenizer manual
 discard 29a600e5f OPENNLP-1850 Tighten normalizer manual wording (review nits)
 discard c0a6e5f9b OPENNLP-1850 Document the supplementary-dash offset shift in 
the DL fold options
 discard dc44533dc OPENNLP-1850 Document the offset-aware substitution folds 
(quotes, digits, ellipsis, bullets, umlaut)
 discard 220e6be46 OPENNLP-1850 Name the OffsetMappingNameFinder capability 
interface in the manual
 discard 1835ee70c OPENNLP-1850 Document the offset-aware normalization 
pipeline (buildAligned)
 discard e60633c9d OPENNLP-1850 Document Unicode normalization, the UAX #29 
tokenizer, and DL handling
 discard 3f77034bb OPENNLP-1850 Fail loud on corrupt document-classification 
model output
 discard 07f18c467 OPENNLP-1850 Fail fast on null finder input; fix the GPU 
eval test options
 discard 47a39bf17 OPENNLP-1850 Harden fail-loud paths in the DL components
 discard 5d074ccac OPENNLP-1850 Add real-model chunk-boundary eval tests; drop 
dead label constants
 discard 07b123286 OPENNLP-1850 Resolve overlapping chunk spans and compose the 
input alignment
 discard 4e3e8d0b0 OPENNLP-1850 Add OffsetMappingNameFinder capability 
interface and a findInOriginal end-to-end test
 discard 166bc4d20 OPENNLP-1850 Offset-safe, Unicode-aware input normalization 
in the DL components
 discard e0ea17cbf OPENNLP-1850 Fail fast on null public-entry arguments 
(review nits)
 discard b15005612 OPENNLP-1850 Clarify that Extended_Pictographic symbols are 
kept as emoji
 discard 2860117dc OPENNLP-1850 Address tokenizer review comments
 discard bf37d092f OPENNLP-1850 Address Copilot review on the UAX #29 tokenizer
 discard fe1e77c7c OPENNLP-1850 UAX #29 word tokenizer and the layered Term 
model
     add 8f1d947dc OPENNLP-1850 Harden andThen insertion mapping docs/tests; 
label rung index
     add 59043dfea OPENNLP-1850 UAX #29 word tokenizer and the layered Term 
model
     add f48f50f1f OPENNLP-1850 Address Copilot review on the UAX #29 tokenizer
     add cc89abf52 OPENNLP-1850 Address tokenizer review comments
     add f70c1956a OPENNLP-1850 Clarify that Extended_Pictographic symbols are 
kept as emoji
     add a75f272f9 OPENNLP-1850 Fail fast on null public-entry arguments 
(review nits)
     add 7a3c25ac7 OPENNLP-1850 Address review: fail-loud TermAnalyzer default; 
harden WordBreakProperty
     add bfcbeb5a1 OPENNLP-1850 Offset-safe, Unicode-aware input normalization 
in the DL components
     add b933a2d97 OPENNLP-1850 Add OffsetMappingNameFinder capability 
interface and a findInOriginal end-to-end test
     add 7127f0650 OPENNLP-1850 Resolve overlapping chunk spans and compose the 
input alignment
     add 280966c73 OPENNLP-1850 Add real-model chunk-boundary eval tests; drop 
dead label constants
     add 706bd2dd9 OPENNLP-1850 Harden fail-loud paths in the DL components
     add 6558e8bc8 OPENNLP-1850 Fail fast on null finder input; fix the GPU 
eval test options
     add 143cdb72d OPENNLP-1850 Fail loud on corrupt document-classification 
model output
     add 1ea12ea28 OPENNLP-1850 Fully-qualify TokenNameFinder javadoc links in 
NameFinderDL
     add d6c31451b OPENNLP-1850 Document Unicode normalization, the UAX #29 
tokenizer, and DL handling
     add ba76a160b OPENNLP-1850 Document the offset-aware normalization 
pipeline (buildAligned)
     add e889bbcc3 OPENNLP-1850 Name the OffsetMappingNameFinder capability 
interface in the manual
     add 0089a7b3a OPENNLP-1850 Document the offset-aware substitution folds 
(quotes, digits, ellipsis, bullets, umlaut)
     add 9e1b1506e OPENNLP-1850 Document the supplementary-dash offset shift in 
the DL fold options
     add 5d4579a0c OPENNLP-1850 Tighten normalizer manual wording (review nits)
     add d8d1b20f6 OPENNLP-1850 Mirror the Extended_Pictographic emoji caveat 
in the tokenizer manual
     add f75627bf1 OPENNLP-1850 Docs review nits: populate ids2Labels example; 
rule-based hyphenation

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (d7838a590)
            \
             N -- N -- N   refs/heads/OPENNLP-1850-4-docs (f75627bf1)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

No new revisions were added by this update.

Summary of changes:
 .../opennlp/tools/util/normalizer/Alignment.java   |  9 ++++
 .../tools/util/normalizer/AlignmentTest.java       | 48 ++++++++++++++++++++++
 .../java/opennlp/dl/namefinder/NameFinderDL.java   |  9 ++--
 .../tools/tokenize/uax29/WordBreakProperty.java    |  9 ++--
 .../tools/util/normalizer/TermAnalyzer.java        |  4 ++
 .../tools/util/normalizer/TextNormalizer.java      |  2 +-
 .../uax29/WordBoundaryConformanceTest.java         |  2 -
 .../tokenize/uax29/WordBreakPropertyTest.java      |  3 +-
 .../normalizer/AlignedNormalizerPipelineTest.java  |  4 +-
 opennlp-docs/src/docbkx/namefinder.xml             | 12 +++++-
 opennlp-docs/src/docbkx/tokenizer.xml              |  4 +-
 11 files changed, 88 insertions(+), 18 deletions(-)

(opennlp) branch OPENNLP-1850-4-docs updated (d7838a590 -> f75627bf1)

Reply via email to