Re: indexing api wrt Analyzer

2008-03-13 Thread John Wang
Excellent! Exactly what I was looking for! Thanks Grant! -John On Thu, Mar 13, 2008 at 5:39 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > There is an addDocument method that takes an Analyzer and overrides > the one used at construction of the IndexWriter. See > > http://lucene.apache.org/j

Re: indexing api wrt Analyzer

2008-03-13 Thread Grant Ingersoll
There is an addDocument method that takes an Analyzer and overrides the one used at construction of the IndexWriter. See http://lucene.apache.org/java/2_3_1/api/core/org/apache/lucene/index/IndexWriter.html#addDocument(org.apache.lucene.document.Document,%20org.apache.lucene.analysis.Analyzer)

Re: indexing api wrt Analyzer

2008-03-13 Thread John Wang
Hi Grant: For our corpus, we don't rely on idf in scoring calculation that much, so I don't see that being a problem that much. About performance, instantiating 1 indexWriter for a batch of say 1000 docs, e.g. iterate over 1000 docs and do addDocument; comparing with instantiating and clo

Re: indexing api wrt Analyzer

2008-03-13 Thread Grant Ingersoll
On Mar 13, 2008, at 11:03 AM, John Wang wrote: Yes, but usually it's a good idea to add documents in batch and not having to reinstantiate the writer for every document and then closing it. It would be nice if one can specify to the writer which analyzer to use. PerfieldAnalyzer wouldn't

Re: indexing api wrt Analyzer

2008-03-13 Thread Grant Ingersoll
On Mar 13, 2008, at 11:03 AM, John Wang wrote: Yes, but usually it's a good idea to add documents in batch and not having to reinstantiate the writer for every document and then closing it. Why does what I suggested require instantiating a new writer for every document? It uses the anal

Re: indexing api wrt Analyzer

2008-03-13 Thread John Wang
Yes, but usually it's a good idea to add documents in batch and not having to reinstantiate the writer for every document and then closing it. It would be nice if one can specify to the writer which analyzer to use. PerfieldAnalyzer wouldn't work because different analyzers may apply on the same

Re: indexing api wrt Analyzer

2008-03-13 Thread Grant Ingersoll
On IndexWriter, you can pass in the Analyzer when you add a Document, thus your application can identify the language, choose the analyzer for the given doc, and then add the document See public void addDocument(Document doc, Analyzer analyzer) On Mar 12, 2008, at 8:40 PM, John Wang wrote:

Re: indexing api wrt Analyzer

2008-03-12 Thread Daniel Noll
On Thursday 13 March 2008 15:21:19 Asgeir Frimannsson wrote: > >I was hoping to have IndexWriter take an AnalyzerFactory, where the > > AnalyzerFactory produces Analyzer depending on some criteria of the > > document, e.g. language. > With PerFieldAnalyzerWrapper, you can specify which analyze

Re: indexing api wrt Analyzer

2008-03-12 Thread Asgeir Frimannsson
On Thu, Mar 13, 2008 at 10:40 AM, John Wang <[EMAIL PROTECTED]> wrote: > Hi all: > >Maybe this has been asked before: > >I am building an index consists of multiple languages, (stored as a > field), and I have different analyzers depending on the language of the > language to be indexed. B