This is an automated email from the ASF dual-hosted git repository.
krickert pushed a change to branch OPENNLP-1850-2-tokenizer
in repository https://gitbox.apache.org/repos/asf/opennlp.git
discard d76a28b55 OPENNLP-1850 Address tokenizer review comments
discard 707eadd4f OPENNLP-1850 Address Copilot review on the UAX #29 tokenizer
discard 3226944f0 OPENNLP-1850 UAX #29 word tokenizer and the layered Term
model
add fb5edf31f OPENNLP-1850 Make the per-code-point substitution folds
offset-aware
add 3c2824c96 OPENNLP-1850 UAX #29 word tokenizer and the layered Term
model
add e597e8b82 OPENNLP-1850 Address Copilot review on the UAX #29 tokenizer
add 40cd7299d OPENNLP-1850 Address tokenizer review comments
This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version. This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:
* -- * -- B -- O -- O -- O (d76a28b55)
\
N -- N -- N refs/heads/OPENNLP-1850-2-tokenizer (40cd7299d)
You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.
Any revisions marked "omit" are not gone; other references still
refer to them. Any revisions marked "discard" are gone forever.
No new revisions were added by this update.
Summary of changes:
.../normalizer/BulletCharSequenceNormalizer.java | 7 +-
.../normalizer/DigitCharSequenceNormalizer.java | 25 ++++++-
.../normalizer/EllipsisCharSequenceNormalizer.java | 43 +++++++++--
.../GermanUmlautCharSequenceNormalizer.java | 65 +++++++++-------
.../normalizer/QuoteCharSequenceNormalizer.java | 10 ++-
.../normalizer/AlignedNormalizerPipelineTest.java | 86 +++++++++++++++++++++-
6 files changed, 197 insertions(+), 39 deletions(-)