Hi Erik, Thanks for the reply.
Here are the answers of your queries que: - The question back to you is do you want searches for simply "MAIN" to find both "MAIN LOGIC" and "MAIN PARTS"? Or should it return no documents since its not an exact match? Ans: It should return no documents since it is not a exact match. I would like to explain my point 2 in below mail again by giveing an example here eg- If there are two records with keyword ie "Main Logic" and "Main Board" respectively. what we want is 1. if the user serch on Main*, both the record should return. 2. If the user search on "Main Logic", only one record should be return. 3. If the user searches for Main, no record should be returned point 2 above is working now because we are using Field.Keyword (which doesn't tokenise the spaces) and KeywordAnalyser. point 1 (wild card search) is not working because KeywordAnalyser does not recognise wild card would appreciate more information on this. thanks in advance. rahul On Wed, 13 Jul 2005 Erik Hatcher wrote : > >On Jul 13, 2005, at 8:18 AM, Rahul D Thakare wrote: >> We are using doc.add(Field.Text("keywords",keywords)); to add the keywords >> to the document, where keywords is comma separated keywords string. > >If the text is already comma separated and that is the level at which you >things tokenized, then simply do something like this (untested pseudo-code): > > String[] values = keywords.split(","); > for (int i=0; i < values.length; i++) > doc.add(Field.Keyword("keywords", values[i])); > >>Lucene seems to tokenize the keywords with multiple words like(MAIN BOARD) >>as different keywords(ie as MAIN and BOARD). Tokenization is based on comma >>and space...So if we search for "MAIN BOARD", documents having keywords like >>"MAIN LOGIC", "MAIN PARTS", etc also show up >> >>If one searches for "MAIN BOARD", we want get only the documents have "MAIN >>BOARD". How to do this ? > >The question back to you is do you want searches for simply "MAIN" to find >both "MAIN LOGIC" and "MAIN PARTS"? Or should it return no documents since >its not an exact match? > >Using the above code, "MAIN" would find neither of those and the query would >have to be exact. I see below you've clarified this requirement... > >>To achieve this we used doc.add(Field.Keyword("keywords", keywords)); and >>while searching >>we cannot use standard analyzer, while searching, as divides the keywords if >>we search keywords having space... so we wrote an >>KeywordAnalyser(KeywordAnalyzer is basically returns only one single token) >>as given below. > >There is a KeywordAnalyzer now in the contrib/analyzers codebase, and it will >ship with the next version of Lucene (or you could build it yourself and use >it). There is also a couple of variants of the KeywordAnalyzer in the Lucene >in Action code (www.lucenebook.com). > >>Which solve the above said problem, but we are not able to the wild card >>searchs like MAIN*, etc. >> >>We need both the functionality ie. >>1. if user searches for MAIN BOARD, should get only documents that contain >>MAIN BOARD and not MAIN LOGIC, MAIN, MAIN PART etc. >>2. User should be able to do the wild card search like MAIN*, etc and get >>the desired documents. >> >>Please let us know, how we should do the indexing ? and which analyzer to >>use to do the search ? > >There are many ways to go about this sort of thing, and I apologize for being >short on time and not able to explain them all fully. One option is to keep >the tokenization using a traditional analyzer so that it separates by >whitespace, but when a user queries it turns into a PhraseQuery. If you >really mean for wildcards to be single words in the field (in other words, >users don't need to query on MA*) then the space separated tokenization would >work fine here as well. > >It is important to think through the analysis process as well as the search >interface issues (the interface must be given thorough consideration and >treated as a first class citizen when discussing implementations), especially >when wildcard and range queries come up. It has been a hot topic recently on >how to deal with wildcards and ranges efficiently. In your example, if by >"MAIN*" you intend for the word MAIN to be a unique token and the user would >choose a full word to search upon and merely wants to find it within a larger > field then wildcards are not necessary. > > Erik > > >--------------------------------------------------------------------- >To unsubscribe, e-mail: [EMAIL PROTECTED] >For additional commands, e-mail: [EMAIL PROTECTED] >