Re: Need help "teaching" Japanese tokenizer to pick up slangs

2014-03-10 Thread Me
Hi everybody UerDictionary is right. I am using yahoo Japanese tokenizer API (日本語形態素解析) to teach my own user dictionary. http://developer.yahoo.co.jp/webapi/jlp/ On 2014/03/11, at 8:10, Rahul Ratnakar wrote: > Worked perfectly for Japanese. > > I have the same issue with Chinese Analyzer, I am

Re: Need help "teaching" Japanese tokenizer to pick up slangs

2014-03-10 Thread Rahul Ratnakar
Worked perfectly for Japanese. I have the same issue with Chinese Analyzer, I am using SmartChinese (lucene-analyzers-smartcn-4.6.0.jar) but I don't see a similar interface as the Japanese analyzer. Is there an easy way to implement the same for Chinese? On Mon, Mar 10, 2014 at 3:26 PM, Rahul R

Re: Need help "teaching" Japanese tokenizer to pick up slangs

2014-03-10 Thread Rahul Ratnakar
Thanks Robert. This was exactly what I was looking for, will try this. On Mon, Mar 10, 2014 at 3:13 PM, Robert Muir wrote: > You can pass UserDictionary with your own entries to do this. > > On Mon, Mar 10, 2014 at 3:08 PM, Rahul Ratnakar > wrote: > > Thanks Furkan, This is the exact tool that

Re: Need help "teaching" Japanese tokenizer to pick up slangs

2014-03-10 Thread Robert Muir
You can pass UserDictionary with your own entries to do this. On Mon, Mar 10, 2014 at 3:08 PM, Rahul Ratnakar wrote: > Thanks Furkan, This is the exact tool that I am using, albeit in my code, I > have tried all search modes e.g. > > new JapaneseAnalyzer(Version.LUCENE_46, null, JapaneseTokenizer

Re: Need help "teaching" Japanese tokenizer to pick up slangs

2014-03-10 Thread Rahul Ratnakar
Thanks Furkan, This is the exact tool that I am using, albeit in my code, I have tried all search modes e.g. new JapaneseAnalyzer(Version.LUCENE_46, null, JapaneseTokenizer.Mode.NORMAL, JapaneseAnalyzer.getDefaultStopSet(), JapaneseAnalyzer.getDefaultStopTags()) new JapaneseAnalyzer(Version.LUCENE

Re: Need help "teaching" Japanese tokenizer to pick up slangs

2014-03-10 Thread Furkan KAMACI
Hi; Here is the page of it that has a online Kuromoji tokenizer and information: http://www.atilika.org/ It may help you. Thanks; Furkan KAMACI 2014-03-10 19:57 GMT+02:00 Rahul Ratnakar : > I am trying to analyze some japanese web pages for presence of slang/adult > phrases in them using lucen

Need help "teaching" Japanese tokenizer to pick up slangs

2014-03-10 Thread Rahul Ratnakar
I am trying to analyze some japanese web pages for presence of slang/adult phrases in them using lucene-analyzers-kuromoji-4.6.0.jar. While the tokenizer breaks up the word into proper words, I am more interested in catching the slangs which seems to result from combining various "safe" words. Few