RE: Integrating NLP into Lucene Analysis Chain

2022-11-21 Thread Wang, Guan
Hi Luke, For what you've described as a "bug" for NLPPOSTaggerOp, I do agree with you that there could be a more elegant solution than simply synchronizing the entire method. That has been said, IMHO, I don't see there is a thread-safe issue. Lucene TokenFilters are not supposed to be shared am

Re: Integrating NLP into Lucene Analysis Chain

2022-11-21 Thread Mikhail Khludnev
Hello, Benoit. I just came across https://lucene.apache.org/core/8_0_0/analyzers-common/org/apache/lucene/analysis/miscellaneous/TypeAsSynonymFilterFactory.html It sounds similar to what you asking, but it watches TypeAttribute only. Also, spans are superseded with intervals https://lucene.apache

Re: Integrating NLP into Lucene Analysis Chain

2022-11-21 Thread Benoit Mercier
Hi Luke, Thank you for your work and information sharing. From my point of view lemmatization is just a use case of text token annotation. I have been working with Lucene since 2006  to index lexicographic and linguistic data and I always miss the fact that (1) token attributes are not search

Re: Sort by numeric field, order missing values before anything else

2022-11-21 Thread Adrien Grand
Uwe, I think that Petko's question was about making sure that missing values would be returned before non-missing values, even though some of these non-missing values might be equal to Long.MIN_VALUE. Which isn't possible today. I agree with your recommendation against going with bytes given the o

Re: Sort by numeric field, order missing values before anything else

2022-11-21 Thread Uwe Schindler
Hi, Long.MIN_VALUE and Long.MAX_VALUE are the correct way for longs to sort. In fact if you have Long.MIN_VALUE in your collection, empty values are treated the same, but still empty value will appear at the wanted place. In contrast to the default "0", it is not somewhere in the middle. Beca