[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

SooMyung Lee (JIRA) Sun, 12 Jan 2014 00:46:46 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13868971#comment-13868971
 ]


SooMyung Lee commented on LUCENE-4956:
--------------------------------------

[~thetaphi‍] I'm trying to change the code to use StandardTokenizer but I found 
a problem. when a text with Chinese characters is passed into the 
StandardTokenizer, It tokenizes Chinese characters into each character. That 
makes it difficult to extract index keywords and map Chinese character to 
Hangul Character. So, to use StandardTokenizer for KoreanAnalyzer, consecutive 
Chinese characters should not be tokenized. 
Can you change the StandardTokenizer as I mentioned?

> the korean analyzer that has a korean morphological analyzer and dictionaries
> -----------------------------------------------------------------------------
>
>                 Key: LUCENE-4956
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4956
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/analysis
>    Affects Versions: 4.2
>            Reporter: SooMyung Lee
>            Assignee: Christian Moen
>              Labels: newbie
>         Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, 
> lucene-4956.patch, lucene4956.patch
>
>
> Korean language has specific characteristic. When developing search service 
> with lucene & solr in korean, there are some problems in searching and 
> indexing. The korean analyer solved the problems with a korean morphological 
> anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
> korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
> and solr. If you develop a search service with lucene in korean, It is the 
> best idea to choose the korean analyzer.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

Reply via email to