Can an analyzer access other field's data during index time?

2023-04-24 Thread Wang, Guan
Hi, I understand Lucene analyzer is per field basis. But I wonder if it's even possible for an analyzer on field A to be able to access data in field B during the index process on any stage, saying CharFilter, Tokenizer or TokenFilter? I'd like to control the behavior of the indexing process fo

Re: Can an analyzer access other field's data during index time?

2023-04-24 Thread Mikhail Khludnev
Hello Guan. It reminds me https://youtu.be/EkkzSLstSAE?t=1531 timecode. I'm afraid it's quite far from the existing codebase where the Field has no reference to enclosing Document. sigh. On Mon, Apr 24, 2023 at 6:00 PM Wang, Guan wrote: > Hi, > > I understand Lucene analyzer is per field basis.

RE: Can an analyzer access other field's data during index time?

2023-04-24 Thread Wang, Guan
Hi Mikhail, Thank you for the definitive answer! I could "solve" this by adding a header in the document with proper information to guide the indexing process. Header will be parsed then ignored by the tokenizer. However, the header along with the actual text will be stored together in that fi

Re: Can an analyzer access other field's data during index time?

2023-04-24 Thread Mikhail Khludnev
Well.. maybe something like https://lucene.apache.org/core/8_5_1/analyzers-common/org/apache/lucene/analysis/miscellaneous/ConditionalTokenFilter.html ? On Mon, Apr 24, 2023 at 11:40 PM Wang, Guan wrote: > Hi Mikhail, > > Thank you for the definitive answer! > > I could "solve" this by adding a

RE: Can an analyzer access other field's data during index time?

2023-04-24 Thread Wang, Guan
Hi Mikhail, Thank you for introducing abstract class ConditionalTokenFilter to me! Took a quick look, it's a wrapper of the upperstream TokenStream with conditional rendition. So, if I have a document like: HEADER TEXT TEXT Implementing ConditionalToeknFilter could only tokenize line 2 and 3.