Re: ICUFoldingFilter

2018-06-04 Thread Robert Muir
There may be a traps, e.g. if you make such a filter with UnicodeSet, I think you really need to call .freeze() before passing it to this thing. I have not examined the sources in a while but I think this might be similar to "compiling a regexp" in that you'll then get good performance when its lat

Re: ICUFoldingFilter

2018-06-04 Thread Michael Sokolov
Ah thanks! That's very good to know. As it is I realized we already have an earlier component where we can handle this (we have a custom ICUTokenizer rbbi and can just split on "^"). So many flexibility -Mike On Mon, Jun 4, 2018 at 10:53 AM, Robert Muir wrote: > actually, you now can choose to

Re: ICUFoldingFilter

2018-06-04 Thread Robert Muir
actually, you now can choose to ignore certain characters by using unicode filtering mechanism. This was added in https://issues.apache.org/jira/browse/LUCENE-8129 So apply a filter such as [^\^] and the filter will ignore ^. On Mon, Jun 4, 2018 at 10:41 AM, Robert Muir wrote: > This cannot be

Re: ICUFoldingFilter

2018-06-04 Thread Robert Muir
This cannot be "tweaked" at runtime, it is implemented as custom normalization. You can modify the sources / build your own ruleset or use a different tokenfilter to normalize characters. On Mon, Jun 4, 2018 at 9:07 AM, Michael Sokolov wrote: > Hi, I'm using ICUFoldingFilter and for the most par

ICUFoldingFilter

2018-06-04 Thread Michael Sokolov
Hi, I'm using ICUFoldingFilter and for the most part it does exactly what I want. However there are some behaviors I'd like to tweak. For example it maps "aaa^bbb" to "aaabbb". I am trying to understand why it does that, and whether there is any way to prevent it. I spent a little time with http:/

Re: EarlyTerminatingSortingCollector is expired in lucene 7.2.1

2018-06-04 Thread Adrien Grand
You are right that TopFieldCollector doesn't address some expert use-cases that EarlyTerminatingSortingCollect used to address. If you need to do something like this I think it's fine for you to fork EarlyTerminatingSortingCollector. Do I get it right that you have two fields A and B and want the