Hi Jamie, What does EmailFilter do?
Why is the expanded form "required for the UAX29URLEmailTokenizer"? Seems like an exact match would work on the email address alone, without the expanded components? Do you have an example of a query that reproducibly matches more documents than it should, and a document that matched but shouldn’t have? Steve On Mar 28, 2014, at 7:00 AM, Jamie <ja...@mailarchiva.com> wrote: > Greetings > > We have a problem whereby Lucene 4.7 occasionally does not apply a filter > query during searching. The problem is intermittent. One in thirty or so > searches will return what appears to be an unfiltered result set. There are > no exceptions or errors occurring.. just incorrect results. We are using > realtime search with multiple index readers. Our software had been working > fine with earlier versions of Lucene. I've double checked the query submitted > to lucene and it appears to be correct. The query looks as follows: > > 2014-03-28 21:16:38 t.c.s.a.s.StandardSearch [DEBUG] start search > {searchquery='',query='*:*',filterQuery='QueryWrapperFilter(+archivedate:[201002280000 > TO 201403282115] +cat:email +(to:"john.doug...@mycompany.com.au john.douglas > mycompany.com.au john douglas mycompany com au com.au" > to:"john....@mycompany.com.au john.doe mycompany.com.au john doe mycompany > com au com.au" from:"john.doug...@mycompany.com.au john.douglas > mycompany.com.au john douglas mycompany com au com.au" > from:"john....@mycompany.com.au john.doe mycompany.com.au john doe mycompany > com au com.au" cc:"john.doug...@mycompany.com.au john.douglas > mycompany.com.au john douglas mycompany com au com.au" > cc:"john....@mycompany.com.au john.doe mycompany.com.au john doe mycompany > com au com.au"))',sort='<long: "mydate">!'} > > The string "john....@mycompany.com.au john.doe mycompany.com.au john doe > mycompany com au com.au" is the required expansion for the > UAX29URLEmailTokenizer. By using quotes, I am aiming for an exact match. This > works most of the time, but not all of the time (as it should). > > I came across: https://issues.apache.org/jira/browse/LUCENE-5502 and applied > it, but it makes no difference. I tried to downgrade Lucene, but it wont read > the 4.6 indexes. Can anyone suggest a way forward? > > Thanks for your recommendations > > Jamie > > ------------------------- > > public final class EmailAnalyzer extends StopwordAnalyzerBase { > > public static final int DEFAULT_MAX_TOKEN_LENGTH = > StandardAnalyzer.DEFAULT_MAX_TOKEN_LENGTH; > private int maxTokenLength = DEFAULT_MAX_TOKEN_LENGTH; > public static final CharArraySet STOP_WORDS_SET = > StopAnalyzer.ENGLISH_STOP_WORDS_SET; > > public EmailAnalyzer(Version matchVersion, CharArraySet stopWords) { > super(matchVersion, stopWords); > } > > public EmailAnalyzer(Version matchVersion) { > this(matchVersion, STOP_WORDS_SET); > } > > public EmailAnalyzer(Version matchVersion, Reader stopwords) throws > IOException { > this(matchVersion, loadStopwordSet(stopwords, matchVersion)); > } > > public void setMaxTokenLength(int length) { > maxTokenLength = length; > } > > public int getMaxTokenLength() { > return maxTokenLength; > } > > protected TokenStreamComponents createComponents(final String fieldName, > final Reader reader) { > final UAX29URLEmailTokenizer src = new > UAX29URLEmailTokenizer(matchVersion, reader); > src.setMaxTokenLength(maxTokenLength); > TokenStream tok = new EmailFilter(src); > tok = new LowerCaseFilter(matchVersion, tok); > return new TokenStreamComponents(src, tok) { > protected void setReader(final Reader reader) throws IOException { > src.setMaxTokenLength(EmailAnalyzer.this.maxTokenLength); > super.setReader(reader); > } > }; > } > } > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org