I'd use StandardAnalyzer, or ClassicAnalyzer. Also depends on how you want to search. You probably want a query for "John Smith" to match
"John Smith" and "Smith, John" but maybe not "John Brown and Sam Smith". The latter is a problem. You can partially work round it by using a BooleanQuery made up of a phrase query, and/or SpanNearQuery with small slop and InOrder true and a general catch all clause, with boosts on the first two. If this is real world data there will always be exceptions and problems. -- Ian. On Fri, Nov 23, 2012 at 2:36 PM, Carsten Schnober <schno...@ids-mannheim.de> wrote: > Hi, > I'm indexing names in a dedicated Lucene field and I wonder which > analyzer to use for that purpose. Typically, the names are in the format > "John Smith", so the WhitespaceAnalyzer is likely the best in most > cases. The field type to choose seems to be the TextField. > Or, would you rather recommend using the KeywordAnalyzer? I'm a bit > cautious about that because I'm afraid of wildcard or regex queries such > as "*Smith" or ".*Smith" respectively. > > However, there might also be special cases and spelling exceptions of > all kinds, e.g. "Smith, John", "John 'Hammmer' Smith", "Abd al-Aziz", > "Stan van Hoop" and what else one could imagine. Is there a special > Analyzer that is optimized on dealing with such cases or do I have to do > normalization beforehand? > I see that such special characters and spellings can easily be covered > by the right queries, but that requires the user to know the exact > spelling, which is what I'm trying to spare her. > > Best regards, > Carsten > > -- > Institut für Deutsche Sprache | http://www.ids-mannheim.de > Projekt KorAP | http://korap.ids-mannheim.de > Tel. +49-(0)621-43740789 | schno...@ids-mannheim.de > Korpusanalyseplattform der nächsten Generation > Next Generation Corpus Analysis Platform > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org