Hi, 

 We are using doc.add(Field.Text("keywords",keywords)); to add the keywords to 
the document, where keywords is comma separated keywords string.
Lucene seems to tokenize the keywords with multiple words like(MAIN BOARD) as 
different keywords(ie as MAIN and BOARD). Tokenization is based on comma and 
space...So if we search for "MAIN BOARD", documents having keywords like "MAIN 
LOGIC", "MAIN PARTS", etc also show up

If one searches for "MAIN BOARD", we want get only the documents have "MAIN 
BOARD".  How to do this ?

To achieve this we used doc.add(Field.Keyword("keywords", keywords)); and while 
searching
we cannot use standard analyzer, while searching, as divides the keywords if we 
search keywords having space... so we wrote an KeywordAnalyser(KeywordAnalyzer 
is basically returns only one single token) as given below.

/**
 * Tokenizes the entire stream as single token
 */

 public class KeywordAnalyzer extends Analyzer
 {
         public TokenStream tokenStream(String fieldName, final Reader reader)
         {
                 return new TokenStream(){
                         private boolean done;
                         private final char[] buffer = new char[1024];
                         public Token next() throws IOException
                         {
                                 if(!done)
                                 {
                                         done = true;
                                         StringBuffer buffer = new 
StringBuffer();
                                         int length = 0;
                                         while(true)
                                         {
                                                 length = 
reader.read(this.buffer);
                                                 if(length == -1) break;

                                                 
buffer.append(this.buffer,0,length);
                                         }
                                         String text = buffer.toString();
                                         return new 
Token(text.toUpperCase(),0,text.length());
                                 }
                                 return null;
                         }
                 };
         }
 }

Which solve the above said problem, but we are not able to the wild card 
searchs like MAIN*, etc.

We need both the functionality ie. 
1.  if user searches for MAIN BOARD, should get only documents that contain 
MAIN BOARD and not MAIN LOGIC, MAIN, MAIN PART etc. 
2. User should be able to do the wild card search like MAIN*, etc and get the 
desired documents.

Please let us know, how we should do the indexing ? and which analyzer to use 
to do the search ?

thanks
Rahul...

Reply via email to