This was the problem, it worked excellent! Thanks for the help!
-Albert karl wettin-3 wrote: > > > 27 aug 2007 kl. 20.30 skrev anorman: > >> >> I've tried to implement an analyzer with little different then using: >> >> result = new ISOLatin1AccentFilter(result); in the TokenStream >> method. >> >> Everything appears to work, however my search will not work for any >> word >> with diacritics with that change. Without using it will find words >> such as >> "cèdulas" but not "cedulas", with the change it will find neither, >> it just >> appears to be stripping it out altogether. > > Do you use the same analyzer when searching as when creating the index? > > -- > karl > > >> >> Any suggestions? >> >> >> >> >> anorman wrote: >>> >>> Can I do this at search time rather than index time? Below is my >>> code >>> that is handling the searching, where would I utilize such a filter? >>> >>> Thanks for the help! >>> >>> >>> >>> >>> package search.lucene.search; >>> import org.apache.lucene.document.Document; >>> import java.io.IOException; >>> import java.util.ArrayList; >>> import java.util.List; >>> >>> import org.apache.lucene.analysis.Analyzer; >>> import org.apache.lucene.analysis.standard.StandardAnalyzer; >>> import org.apache.lucene.analysis.ISOLatin1AccentFilter; >>> import org.apache.lucene.queryParser.ParseException; >>> import org.apache.lucene.queryParser.QueryParser; >>> import org.apache.lucene.search.Hits; >>> import org.apache.lucene.search.IndexSearcher; >>> import org.apache.lucene.search.Query; >>> >>> import search.lucene.index.IndexManager; >>> >>> /** >>> * This class is used to search the >>> * Lucene index and return search results >>> */ >>> >>> public class SearchManager { >>> >>> >>> private String searchWord; >>> >>> private IndexManager indexManager; >>> >>> private Analyzer analyzer; >>> >>> public SearchManager(String searchWord){ >>> this.searchWord = searchWord; >>> this.indexManager = new IndexManager(); >>> this.analyzer = new StandardAnalyzer(); >>> } >>> >>> /** >>> * do search >>> */ >>> public List search(){ >>> List searchResult = new ArrayList(); >>> >>> IndexSearcher indexSearcher = null; >>> >>> try{ >>> indexSearcher = new IndexSearcher >>> (indexManager.getIndexDir()); >>> }catch(IOException ioe){ >>> ioe.printStackTrace(); >>> } >>> >>> QueryParser queryParser = new QueryParser >>> ("content",analyzer); >>> Query query = null; >>> try { >>> query = queryParser.parse(searchWord); >>> } catch (ParseException e) { >>> e.printStackTrace(); >>> } >>> >>> if(null != query && null != indexSearcher){ >>> try { >>> Hits hits = indexSearcher.search(query); >>> for(int i = 0; i < hits.length(); i ++){ >>> >>> Document doc = hits.doc(i); >>> System.out.println(doc.get("filename")); >>> >>> SearchResultBean resultBean = new >>> SearchResultBean(); >>> >>> >>> resultBean.setXMLId(hits.doc(i).get("id")); >>> >>> resultBean.setXMLTitle(hits.doc(i).get("title")); >>> >>> resultBean.setXMLAuthor(hits.doc(i).get("author")); >>> >>> resultBean.setXMLAbstract(hits.doc(i).get("abstract")); >>> resultBean.setScore(hits.score(i)); >>> >>> searchResult.add(resultBean); >>> } >>> } catch (IOException e) { >>> e.printStackTrace(); >>> } >>> } >>> return searchResult; >>> >>> } >>> >>> >>> >>> >>> >>> >>> thomas arni-2 wrote: >>>> >>>> You can extend the DefaultAnalyzer. >>>> The only thing you have to do, is to rewrite the method >>>> tokenStream like >>>> this: >>>> >>>> /** Constructs a [EMAIL PROTECTED] StandardTokenizer} filtered by a >>>> [EMAIL PROTECTED] >>>> StandardFilter}, a [EMAIL PROTECTED] LowerCaseFilter} and a [EMAIL >>>> PROTECTED] >>>> StopFilter}. */ >>>> public TokenStream tokenStream(String fieldName, Reader reader) { >>>> TokenStream result = new StandardTokenizer(reader); >>>> result = new StandardFilter(result); >>>> result = new LowerCaseFilter(result); >>>> result = new StopFilter(result, stopSet); >>>> result = new ISOLatin1AccentFilter(result); >>>> return result; >>>> } >>>> >>>> >>>> anorman wrote: >>>>> This looks like exactly what I want. Would I implement this >>>>> along with >>>>> another analyzer such as the standard or stand alone? Does >>>>> anyone have >>>>> any >>>>> code examples of implementing such a thing? >>>>> >>>>> Thanks, >>>>> Albert >>>>> >>>>> >>>>> >>>>> >>>>> karl wettin-3 wrote: >>>>> >>>>>> 27 aug 2007 kl. 16.03 skrev anorman: >>>>>> >>>>>> >>>>>>> I have a searchable index of documents which contain french and >>>>>>> spanish >>>>>>> diacritics (è, é, À) etc. I would like to make the content >>>>>>> searchable so >>>>>>> that when a user searches for a word such as "Amèrique" or >>>>>>> "Amerique" >>>>>>> (without diacritic) then it returns the same results. >>>>>>> >>>>>>> Has anyone set up something similar? >>>>>>> >>>>>> ISOLatin1AccentFilter >>>>>> >>>>>> -- >>>>>> karl >>>>>> ------------------------------------------------------------------ >>>>>> --- >>>>>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>>>>> For additional commands, e-mail: [EMAIL PROTECTED] >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>>> -------------------------------------------------------------------- >>>> - >>>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>>> For additional commands, e-mail: [EMAIL PROTECTED] >>>> >>>> >>>> >>> >>> >> >> -- >> View this message in context: http://www.nabble.com/Searching- >> Diacritics-tf4335454.html#a12354962 >> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > -- View this message in context: http://www.nabble.com/Searching-Diacritics-tf4335454.html#a12366572 Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]