Hi,
I understand Lucene analyzer is per field basis. But I wonder if it's even
possible for an analyzer on field A to be able to access data in field B during
the index process on any stage, saying CharFilter, Tokenizer or TokenFilter?
I'd like to control the behavior of the indexing process fo
Hello Guan.
It reminds me https://youtu.be/EkkzSLstSAE?t=1531 timecode.
I'm afraid it's quite far from the existing codebase where the Field has no
reference to enclosing Document. sigh.
On Mon, Apr 24, 2023 at 6:00 PM Wang, Guan wrote:
> Hi,
>
> I understand Lucene analyzer is per field basis.
Hi Mikhail,
Thank you for the definitive answer!
I could "solve" this by adding a header in the document with proper information
to guide the indexing process. Header will be parsed then ignored by the
tokenizer. However, the header along with the actual text will be stored
together in that fi
Well.. maybe something like
https://lucene.apache.org/core/8_5_1/analyzers-common/org/apache/lucene/analysis/miscellaneous/ConditionalTokenFilter.html
?
On Mon, Apr 24, 2023 at 11:40 PM Wang, Guan wrote:
> Hi Mikhail,
>
> Thank you for the definitive answer!
>
> I could "solve" this by adding a
Hi Mikhail,
Thank you for introducing abstract class ConditionalTokenFilter to me! Took a
quick look, it's a wrapper of the upperstream TokenStream with conditional
rendition.
So, if I have a document like:
HEADER
TEXT
TEXT
Implementing ConditionalToeknFilter could only tokenize line 2 and 3.