There may be a traps, e.g. if you make such a filter with UnicodeSet,
I think you really need to call .freeze() before passing it to this
thing. I have not examined the sources in a while but I think this
might be similar to "compiling a regexp" in that you'll then get good
performance when its lat
Ah thanks! That's very good to know. As it is I realized we already have an
earlier component where we can handle this (we have a custom ICUTokenizer
rbbi and can just split on "^"). So many flexibility
-Mike
On Mon, Jun 4, 2018 at 10:53 AM, Robert Muir wrote:
> actually, you now can choose to
actually, you now can choose to ignore certain characters by using
unicode filtering mechanism.
This was added in https://issues.apache.org/jira/browse/LUCENE-8129
So apply a filter such as [^\^] and the filter will ignore ^.
On Mon, Jun 4, 2018 at 10:41 AM, Robert Muir wrote:
> This cannot be
This cannot be "tweaked" at runtime, it is implemented as custom normalization.
You can modify the sources / build your own ruleset or use a different
tokenfilter to normalize characters.
On Mon, Jun 4, 2018 at 9:07 AM, Michael Sokolov wrote:
> Hi, I'm using ICUFoldingFilter and for the most par
Hi, I'm using ICUFoldingFilter and for the most part it does exactly what I
want. However there are some behaviors I'd like to tweak. For example it
maps "aaa^bbb" to "aaabbb". I am trying to understand why it does that, and
whether there is any way to prevent it.
I spent a little time with
http:/
You are right that TopFieldCollector doesn't address some expert use-cases
that EarlyTerminatingSortingCollect used to address. If you need to do
something like this I think it's fine for you to fork
EarlyTerminatingSortingCollector.
Do I get it right that you have two fields A and B and want the