Re: How to delete a token that comes exactly after a token

Jack Krupansky Wed, 26 Feb 2014 17:03:35 -0800

If this is primarily an issue with the document input, as opposed toqueries, you might be better off simply preprocessing the text before it isgiven to Lucene to be indexed.


-- Jack Krupansky

-----Original Message-----From: Furkan KAMACI

Sent: Wednesday, February 26, 2014 1:37 PM
To: java-user@lucene.apache.org
Subject: Re: How to delete a token that comes exactly after a token

Hi;

I'm parsing a wiki dump file. There are some special definitions. In
example:

link:km

so when I parse my text I have that tokens: "link" and "km". I want to
remove "link" and it is a stopword for my situation. However I want to
remove "km" too if km is followed by token of "link". If there is no such
an implementation I can implement a patch for it?

Thanks;
Furkan KAMACI

2014-02-26 17:36 GMT+02:00 Jack Krupansky <j...@basetechnology.com>:

Sounds like a custom filter.

Or maybe an option for stop filter or a specialization of stop filter.

Or maybe it could be even more generalized.

What are some practical example token sequences?

-- Jack Krupansky

-----Original Message----- From: Furkan KAMACI Sent: Wednesday, February
26, 2014 9:48 AM To: java-user@lucene.apache.org Subject: How to delete a
token that comes exactly after a token
 Hi;

How can I delete a token that comes exactly after a token for
StopwordFilter?

Thanks;
Furkan KAMACI

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: How to delete a token that comes exactly after a token

Reply via email to