Are you using ISOLatin1AccentFilter ?

[]s,

Lucas Frare A. Teixeira
[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
Tel: +55 11 3660.1622 - R3018



Vinicius Carvalho escreveu:
Hello there! I'm indexing documents using the BrazilianAnalyzer, and I've
noticed that many words are not being indexed. I store and index the entire
doc (I'm doing this in order to present the fragments on the results, don't
know if its the best way, mostly on large docs, any ideas?). Well using luke
to check the index I open the stored doc, and its contents contains 17
occurrences of the word "herança" for instance. But, there's no term for
this word or it stemm version: "heranc", so searching for this word would
not return a result for this document.

I'm pretty sure I'm missing something on the indexing process:


try {
            doc.add(new
Field("contents",docText,Field.Store.YES,Field.Index.TOKENIZED,Field.TermVector.YES));
            IndexWriter writer = new
IndexWriter("/java/lucene/portal/cms",new BrazilianAnalyzer()); // gotta
improve this latter
            writer.addDocument(doc);
            writer.close();
        }


So, why would these word (and others) not being indexed?

Regards

Reply via email to