It depends completely on what analyzer you use. Conceptually, an Analyzer is composed of a Tokenizer followed by any number of Filters. So the input stream is broken up by the Tokenizer, then each token has one or more Filters applied (e.g. LowerCaseFilter, StopWordFilter)..
The reason I'm not answering your question directly is that I can't. If you choose, say, a WhitespaceAnalyzer, which is built from a WhitespaceTokenizer, then your hyphens and apostrophes will pass through as-is, and your tokens (the minimal searchable unit) will be "Jane" "Doe-Smith" and "Sa'eed", capitals and all. If you choose StandardAnalyzer, built on StandardTokenizer and several filters your tokens would be "jane" "doe" "smith" "sa" "eed" (note lower-casing as well). You can build your own Analyzers to process text however you please. Lucene In Action has quite a thorough explanation of this process, you'll save yourself a bunch of time by reading those sections. You can get the second edition of that book in electronic form from Manning through their early access program. Until you understand this process well, I'd recommend that you be very, very sure that you use the *same* analyzer for both indexing and searching or your results will be...surprising. Think about getting a copy of Luke to examine your indexes, that tool makes it easy to see the effects of various Analyzers. Google Lucene Luke.... Finally, you can easily use *different* analyzers for different fields within a document, see PerFieldAnalyzerWrapper. HTH Erick On Sun, Dec 27, 2009 at 5:48 PM, syedfa <fayyazud...@gmail.com> wrote: > > Dear fellow Java developers: > > I have a very basic question about indexing text using Lucene. I am > indexing a large amount of text, that includes names that contain certain > punctuation (eg. "Jane Doe-Smith", "Sa'eed", etc.) Will the punctuation > throw off the indexer in any way, such that it breaks up the tokens when > they shouldn't be, or will the indexer simply treat the punctuation inside > the names as any other character, and the presence of the punctuation will > not in any way hinder a user's ability to search for that name? Are there > any precautions that I should take to avoid any problems? > > I hope this question is clear and makes sense. > > Thanks in advance to all who reply. > > -- > View this message in context: > http://old.nabble.com/Basic-question-about-indexing-certain-words-tp26937880p26937880.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >