Hi everybody
UerDictionary is right.
I am using yahoo Japanese tokenizer API (日本語形態素解析) to teach my own user
dictionary.
http://developer.yahoo.co.jp/webapi/jlp/
On 2014/03/11, at 8:10, Rahul Ratnakar wrote:
> Worked perfectly for Japanese.
>
> I have the same issue with Chinese Analyzer, I am
Worked perfectly for Japanese.
I have the same issue with Chinese Analyzer, I am using SmartChinese
(lucene-analyzers-smartcn-4.6.0.jar) but I don't see a similar interface as
the Japanese analyzer. Is there an easy way to implement the same for
Chinese?
On Mon, Mar 10, 2014 at 3:26 PM, Rahul R
Thanks Robert. This was exactly what I was looking for, will try this.
On Mon, Mar 10, 2014 at 3:13 PM, Robert Muir wrote:
> You can pass UserDictionary with your own entries to do this.
>
> On Mon, Mar 10, 2014 at 3:08 PM, Rahul Ratnakar
> wrote:
> > Thanks Furkan, This is the exact tool that
You can pass UserDictionary with your own entries to do this.
On Mon, Mar 10, 2014 at 3:08 PM, Rahul Ratnakar
wrote:
> Thanks Furkan, This is the exact tool that I am using, albeit in my code, I
> have tried all search modes e.g.
>
> new JapaneseAnalyzer(Version.LUCENE_46, null, JapaneseTokenizer
Thanks Furkan, This is the exact tool that I am using, albeit in my code, I
have tried all search modes e.g.
new JapaneseAnalyzer(Version.LUCENE_46, null, JapaneseTokenizer.Mode.NORMAL,
JapaneseAnalyzer.getDefaultStopSet(), JapaneseAnalyzer.getDefaultStopTags())
new JapaneseAnalyzer(Version.LUCENE
Hi;
Here is the page of it that has a online Kuromoji tokenizer and
information: http://www.atilika.org/ It may help you.
Thanks;
Furkan KAMACI
2014-03-10 19:57 GMT+02:00 Rahul Ratnakar :
> I am trying to analyze some japanese web pages for presence of slang/adult
> phrases in them using lucen
I am trying to analyze some japanese web pages for presence of slang/adult
phrases in them using lucene-analyzers-kuromoji-4.6.0.jar. While the
tokenizer breaks up the word into proper words, I am more interested in
catching the slangs which seems to result from combining various "safe"
words.
Few