efficient ways of updating document

2007-01-04 Thread John Song
It seems to me that updating a document is rather tedious and slow in lucene, especially for updating large number of documents. Before opening an IndexWriter to add documents, one has to open an IndexReader/IndexSearcher to search for the document of a particular id. Upon finding its docnum,

Re: digester/lucene runtime problems

2007-01-04 Thread Otis Gospodnetic
Mark, this is purely related to Digester, not Lucene. It's hard to tell which method Digester is trying to call, as it uses reflection and the exception stack trace doesn't show enough. My guess is it always happens on the same XML element, if you run this multiple times. Add some print calls

Re: Enhance similarity to pass in field name (Reposting)

2007-01-04 Thread Chris Hostetter
: However, we don't have access to the field name in the function public float : tf(float freq). : Is there any way out ? I would think passing the field name along with the : freq to : create public float tf(float freq, String fieldName) would be useful - : unless there is a better way. there

Enhance similarity to pass in field name (Reposting)

2007-01-04 Thread escher2k
Hi, I am trying to create a linear function to influence the similarity computation. For example - if tf = 4, f(tf) = 150 * 1 + 150 * 0.3 = 195 The first occurrence is multiplied by 150. The next three occurrences are mulitplied by 150 and divided by 10 (3/10). Ho

Re: lucene scalability questions

2007-01-04 Thread Russ
If you do this on windows, you might be able to replicate the indexes using DFS. On linux you can probably use rsync to keep the different servers up to date. If the size of the index is an issue, lustre could be used to have one volume that's spread over many servers. Performance is supposed

Hence Similarity to pass in fieldName...

2007-01-04 Thread escher2k
Hi, I am trying to create a linear function to influence the similarity computation. For example - if tf = 4, f(tf) = 150 * 1 + 150 * 0.3 = 195 The first occurrence is multiplied by 150. The next three occurrences are mulitplied by 150 and divided by 10 (3/10). Ho

Re: lucene scalability questions

2007-01-04 Thread Peter W.
Mark, My understanding of Lucene is limited, but the issues seem similar to web server farms in that it comes down to linear scalability by adding more boxes. This means separate machines with their own indexes. Shared filesystems such as NFS work well in smaller environments but experience pro

Re: getting the maximum Hits doc

2007-01-04 Thread Dennis Kubes
Hits should be sorted according to score. Getting the first document should give you the one with the highest score. Dennis Nils Höller wrote: Hi, this is a short beginner question: I am searching for something in my program Hits hits = MySearcher.search(queryStr, searchRes.indexPath); N

getting the maximum Hits doc

2007-01-04 Thread Nils Höller
Hi, this is a short beginner question: I am searching for something in my program Hits hits = MySearcher.search(queryStr, searchRes.indexPath); Now I only want the Document with the highest score. Is there a better way, then iterating through all hits? The Hits objects seems to be not sorted.

lucene scalability questions

2007-01-04 Thread Mark Mei
So this question has two parts: 1. How does Lucene scale, exactly? Do we distribute the index to multiple servers somehow? Or is it one index, sitting on some sort of a shared filesystem, shared by all Lucene servers? If it's the latter, the bottleneck will be I/O ... anyway, elaborate on scalabi

Re: Lucene index update

2007-01-04 Thread Erick Erickson
for approach <2>, I *think* you can extract information about unstored data by playing with TermDocs/TermEnums. Conceptually, the idea is to go through all the terms and, for document (lucene ID) 1 find the terms that appear in document 1 and order them by their termpositions. Repeat for document

Lucene index update

2007-01-04 Thread Ivan Vasilev
Hi All, I want to update some documents in existing indexes by adding a new field to each of their documents. The documents contained in the indexes have some fields that are indexed and NOT stored. The new field that will be added will contain some metadata and will be Stored and not indexe

Re: obscure error...

2007-01-04 Thread Dan Armbrust
It turns out, this is somehow related to an interaction between SWT and the java Decompresser class - certainly not lucene related. FYI: https://bugs.eclipse.org/bugs/show_bug.cgi?id=169484 -- Daniel Armbrust Biomedical Informatics Mayo Clinic Rochester daniel.arm

Re: Query

2007-01-04 Thread Erik Hatcher
This is not a Lucene issue, but rather a 3rd party tool you're using, which seems to have instrumented your Java runtime. Googling for "com.trend.iwss.jscan.appscan" turned up a lot of similar issues:

Query

2007-01-04 Thread maarsh
hi , i am using Lucene2.0.0 with jre1.4.2_03 . it is simple program in which i am indexing an xml file . but when i run it , i get this error java.lang.NoSuchMethodError: com.trend.iwss.jscan.appscan.runtime.PolicyProps: method ()V not found at com.trend.iwss.jscan.appscan.runtime.Session.(S