Mile,

Any Analyzer that uses a Tokenizer that throws out non-characters will do.
For example, take a look at SimpleAnalyzer.  It uses LowerCaseTokenizer.  If 
you read the javadoc for LowerCaseTokenizer, I think you will see it suits you.

Otis

----- Original Message ----
From: Mile Rosu <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Wednesday, May 31, 2006 11:47:12 AM
Subject: Removing brackets before indexing

Hello!

I am currently trying to index latin language documents, in which
missing letters are appended to words by using square brackets, like
this : "[divinit]atis". 

Could you tell me please which would be the best practice to remove the
brackets before adding into the Lucene index? (in the example to store
the word "divinitatis").

Thank you a lot,
Mile Rosu

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to