On Mon, 2006-07-24 at 13:51 +0200, karl wettin wrote: > On Mon, 2006-07-24 at 00:34 -0400, Yonik Seeley wrote: > > > filter words with a dash > > > > > > ["x-men"] > > > ["xmen"] > > > ["x", "men"] > > > > > > The problem is ["x", "men"] requiring a distance between the terms > > > and thus also matching "x-men men". > > > > WordDelimiterFilter from Solr does this > > > It also has the false match problem you mention... > > Will it effect a phrase query? > > I.e. would "the xmen are" be a no-match as the filtered index data > would be "the x (men|xmen|x-men) are here"? > > I'll write a test now.
Yes, it effects PhraseQuery. Only "the x men are" will match. package org.apache.solr.analysis; import org.apache.lucene.analysis.TokenStream; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.Term; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.PhraseQuery; import org.apache.lucene.store.Directory; import org.apache.lucene.store.RAMDirectory; import java.io.Reader; import java.util.HashSet; public class TestWordDelimiterFilter { public static void main(String[] args) throws Exception { final String field = "field"; Directory dir = new RAMDirectory(); Analyzer a = new Analyzer(); IndexWriter w = new IndexWriter(dir, a, true); Document d = new Document(); d.add(new Field(field, "the x-men are here", Field.Store.NO, Field.Index.TOKENIZED, Field.TermVector.NO)); w.addDocument(d); w.close(); IndexSearcher is = new IndexSearcher(dir); PhraseQuery pq = new PhraseQuery(); pq.add(new Term(field, "the")); pq.add(new Term(field, "x-men")); pq.add(new Term(field, "are")); System.out.println(is.search(pq).length()); pq = new PhraseQuery(); pq.add(new Term(field, "the")); pq.add(new Term(field, "xmen")); pq.add(new Term(field, "are")); System.out.println(is.search(pq).length()); pq = new PhraseQuery(); pq.add(new Term(field, "the")); pq.add(new Term(field, "x")); pq.add(new Term(field, "men")); pq.add(new Term(field, "are")); System.out.println(is.search(pq).length()); is.close(); dir.close(); } public static class Analyzer extends org.apache.lucene.analysis.Analyzer { public TokenStream tokenStream(String fieldName, Reader reader) { TokenStream ts = new StandardAnalyzer(new HashSet()).tokenStream(fieldName, reader); ts = new WordDelimiterFilter(ts, 1,1,0,0,0); return ts; } } } --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]