Re: any analyzer will keep punctuation?

2017-03-06 Thread Ralph Soika
What you can do, is adding a custom search field with the singer name into your document to be indexed : doc.add(new StringField("singername", myValue, Store.NO)); Than you query you index like this: String myquery="(singername:\" + searchphrase + "\") or (" + searchphrase + ")"; in

Re: any analyzer will keep punctuation?

2017-03-06 Thread Ahmet Arslan
Hi Zhao, WhiteSpace tokeniser followed by a customised word delimiter filter factory would be solution. Please see types attribute of the word delimiter filter for customising characters. ahmet On Monday, March 6, 2017 12:22 PM, Yonghui Zhao wrote: Yes whitespace analyzer will keep punctuat

Re: any analyzer will keep punctuation?

2017-03-06 Thread Michael McCandless
You could use ICUTokenizer and make a custom RuleBasedBreakIterator .rbbi file to control precisely when splitting should happen, but that language is complex to configure ;) Another option is to maybe make a CharFilter ahead of StandardTokenizer that tries to rewrite the punctuation you want to k

Re: any analyzer will keep punctuation?

2017-03-06 Thread Yonghui Zhao
Yes whitespace analyzer will keep punctuation, but it only breaks word by space. I didn’t explain my requirement clearly. I want to an analyzer like standard analyzer but may keep some punctuation configured. 2017-03-06 18:03 GMT+08:00 Ahmet Arslan : > Hi, > > Whitespace analyser/tokenizer for

Re: any analyzer will keep punctuation?

2017-03-06 Thread Ahmet Arslan
Hi, Whitespace analyser/tokenizer for example. Ahmet On Monday, March 6, 2017 10:21 AM, Yonghui Zhao wrote: Lucene standard anlyzer will remove almost all punctuation. In some cases, we want to keep some punctuation, for example in music search, some singer name and album name could be a punc

any analyzer will keep punctuation?

2017-03-06 Thread Yonghui Zhao
Lucene standard anlyzer will remove almost all punctuation. In some cases, we want to keep some punctuation, for example in music search, some singer name and album name could be a punctuation. Is there any analyzer that we can customized punctuation to be removed?