[
https://issues.apache.org/jira/browse/LUCENE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16923279#comment-16923279
]
Lucene/Solr QA commented on LUCENE-8966:
----------------------------------------
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m
0s{color} | {color:green} The patch appears to include 1 new or modified test
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m
40s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Release audit (RAT) {color} |
{color:green} 0m 36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Check forbidden APIs {color} |
{color:green} 0m 36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate source patterns {color} |
{color:green} 0m 36s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m
43s{color} | {color:green} nori in the patch passed. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 4m 35s{color} |
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | LUCENE-8966 |
| JIRA Patch URL |
https://issues.apache.org/jira/secure/attachment/12979538/LUCENE-8966.patch |
| Optional Tests | compile javac unit ratsources checkforbiddenapis
validatesourcepatterns |
| uname | Linux lucene1-us-west 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24
10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality |
/home/jenkins/jenkins-slave/workspace/PreCommit-LUCENE-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
|
| git revision | master / 78b6530fb26 |
| ant | version: Apache Ant(TM) version 1.10.5 compiled on March 28 2019 |
| Default Java | LTS |
| Test Results |
https://builds.apache.org/job/PreCommit-LUCENE-Build/205/testReport/ |
| modules | C: lucene/analysis/nori U: lucene/analysis/nori |
| Console output |
https://builds.apache.org/job/PreCommit-LUCENE-Build/205/console |
| Powered by | Apache Yetus 0.7.0 http://yetus.apache.org |
This message was automatically generated.
> KoreanTokenizer should split unknown words on digits
> ----------------------------------------------------
>
> Key: LUCENE-8966
> URL: https://issues.apache.org/jira/browse/LUCENE-8966
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Jim Ferenczi
> Priority: Minor
> Attachments: LUCENE-8966.patch
>
>
> Since https://issues.apache.org/jira/browse/LUCENE-8548 the Korean tokenizer
> groups characters of unknown words if they belong to the same script or an
> inherited one. This is ok for inputs like Мoscow (with a Cyrillic М and the
> rest in Latin) but this rule doesn't work well on digits since they are
> considered common with other scripts. For instance the input "44사이즈" is kept
> as is even though "사이즈" is part of the dictionary. We should restore the
> original behavior and splits any unknown words if a digit is followed by
> another type.
> This issue was first discovered in
> [https://github.com/elastic/elasticsearch/issues/46365]
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]