David Smiley created LUCENE-7342:
------------------------------------

             Summary: WordDelimiterFilter should observe KeywordAttribute to 
pass these tokens through
                 Key: LUCENE-7342
                 URL: https://issues.apache.org/jira/browse/LUCENE-7342
             Project: Lucene - Core
          Issue Type: Improvement
          Components: modules/analysis
            Reporter: David Smiley


I have a text analysis requirement in which I want certain tokens to not be 
processed by WordDelimiterFilter -- i.e. they should pass through that filter.  
WDF, like several other TokenFilters, has a configurable word list but this 
list is static producing a concrete CharArraySet.  Thus, for example, I can't 
filter by a regexp nor can I filter based on other attributes.

A simple solution that makes sense to me is to have WDF use KeywordAttribute to 
know if it should skip the token.  KeywordAttribute seems fairly generic as to 
how it can be used, although granted today it's only used by the stemmers.  
That attribute isn't named "StemmerIgnoreAttribute" or some-such; it's generic 
so I think it's fine for WDF to use it in a similar way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to