be used to yield
indexed and stored (Lucene) fields with different content.
If your logic is so comprehensive you may also consider to completely extract
analysis logic
https://solr.apache.org/guide/solr/latest/indexing-guide/external-files-processes.html#the-preanalyzedfield-type
On Tue, Apr
time?
External Email - Use Caution
Guan,
I hardly grasp the particular obstacle. But I don't think that the task is out
of reach overall. Can you share a test case formally describing the desired
behavior?
On Tue, Apr 25, 2023 at 12:29 AM Wang, Guan wrote:
> Hi Mikhail,
>
> Thank
2023 at 11:40 PM Wang, Guan wrote:
> Hi Mikhail,
>
> Thank you for the definitive answer!
>
> I could "solve" this by adding a header in the document with proper
> information to guide the indexing process. Header will be parsed then
> ignored by the tokenizer. However
he existing codebase where the Field has no
reference to enclosing Document. sigh.
On Mon, Apr 24, 2023 at 6:00 PM Wang, Guan wrote:
> Hi,
>
> I understand Lucene analyzer is per field basis. But I wonder if it's
> even possible for an analyzer on field A to be able to access da
Hi,
I understand Lucene analyzer is per field basis. But I wonder if it's even
possible for an analyzer on field A to be able to access data in field B during
the index process on any stage, saying CharFilter, Tokenizer or TokenFilter?
I'd like to control the behavior of the indexing process fo
Hi Luke,
For what you've described as a "bug" for NLPPOSTaggerOp, I do agree with you
that there could be a more elegant solution than simply synchronizing the
entire method. That has been said, IMHO, I don't see there is a thread-safe
issue. Lucene TokenFilters are not supposed to be shared am
Hi,
May someone explain to me why class SegmentingTokenizerBase using a buffer with
a size of only 1024 characters? In the source code, the comment was left there
mentioning possible truncated token if no safe-stopping index can be found for
the existing chars in the buffer.
It doesn't sound r