How to open IndexWriter to append document?

2007-01-14 Thread David
Hi all: I want first erase the original index and then create an index for appending, I use the following python code using ports pyLucene. def store(doc) store = PyLucene.FSDirectory.getDirectory("index", True) writer = PyLucene.IndexWriter(store, StandardAnalyzer, False ) # e

Re: How to retrieve the document by document ID?

2007-01-14 Thread Doron Cohen
David <[EMAIL PROTECTED]> wrote on 14/01/2007 20:08:05: > thanks, How do Lucene give each document an ID when the document is added? > Is the document ID unchanged until the document is deleted? > Not exactly. When the first doc is added, it is assigned id 0. Next one assigned id 1, etc. When a

Re: Perform indexing and searching concurrently

2007-01-14 Thread Chris Hostetter
:I'm wondering what will happened if I performance indexing and have 10 : peoples do searching at the same time? Can I retrieve the results while I do : index, and the other way around? >From the FAQ http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-6c56b0449d114826586940dcc6fe5158267

Re: How to retrieve the document by document ID?

2007-01-14 Thread David
thanks, How do Lucene give each document an ID when the document is added? Is the document ID unchanged until the document is deleted? 2007/1/12, Otis Gospodnetic <[EMAIL PROTECTED]>: David, please look at the Javadoc for IndexReader. I believe the API is reader.document(int), where reader is

Perform indexing and searching concurrently

2007-01-14 Thread spinergywmy
Hi, I'm wondering what will happened if I performance indexing and have 10 peoples do searching at the same time? Can I retrieve the results while I do index, and the other way around? Thanks. regards, Wooi Meng -- View this message in context: http://www.nabble.com/Perform-indexing-an

Re: Using Lucene to index a web forum

2007-01-14 Thread Nicolas Lalevée
Le Samedi 13 Janvier 2007 16:48, Melange a écrit : > Nicolas Lalevée-2 wrote: > > Le Samedi 13 Janvier 2007 10:49, Melange a écrit : > >> Hello, I'd like to index a web forum (phpBB) with Lucene. I wonder how > >> to best map the forum document model (topics and their messages) to the > >> Lucene >

Re: Making document numbers persistent

2007-01-14 Thread karl wettin
14 jan 2007 kl. 17.46 skrev Erick Erickson: Map size, 10,000,000 pairs Looking up 1,000,000 user ids and setting them in a bitset. Total time to set all the bits, 1.016 seconds. Running inside of Eclipse on a 2700 MH AMD with 1G memory (and I used up almost all this memory, but made no

Re: Making document numbers persistent

2007-01-14 Thread Kay Roepke
On 14. Jan 2007, at 17:46 , Erick Erickson wrote: I just love it when I get so wrapped up in a particular approach that alternatives don't occur to me. So I wondered what would happen if I just got stupid simple and tried solving what I think is your problem without involving lucene. So,

Re: Making document numbers persistent

2007-01-14 Thread Erick Erickson
I just love it when I get so wrapped up in a particular approach that alternatives don't occur to me. So I wondered what would happen if I just got stupid simple and tried solving what I think is your problem without involving lucene. So, I wrote a little program to fill up a HashMap with pairs,

Re: Making document numbers persistent

2007-01-14 Thread Kay Roepke
On 14. Jan 2007, at 3:54 , Erick Erickson wrote: 3> I doubt it really will make a performance difference, but you could use TermDocs.seek rather than get a new termdocs for each term from the reader. (and if this *does* make a difference, please let me know) It seems it does. I have just

Re: Making document numbers persistent

2007-01-14 Thread Kay Roepke
On 14. Jan 2007, at 8:51 , Doron Cohen wrote: I think that one effective way to control docids changes, assuming delete/update rate significantly lower than add rate, is to modify Lucene such that deleted docs are only 'squeezed out' when calling optimize (). This would involve delicate cha

Re: Making document numbers persistent

2007-01-14 Thread Kay Roepke
On 14. Jan 2007, at 10:58 , karl wettin wrote: In the original post you mention 2-10 million documents. How much is that is bytes? On my development machine I have 1.5 million documents and those are weighing in at ~950MB. I suspect that for production we will add more fields, so it woul

Re: Making document numbers persistent

2007-01-14 Thread Kay Roepke
On 14. Jan 2007, at 7:10 , Chris Hostetter wrote: if you're talking about multiple identical servers used for load balancing, then there is no reason why those indexes wouldn't be kept in sync (the merge model is deterministic, so if you apply the same operations to every server in the same

Re: Making document numbers persistent

2007-01-14 Thread karl wettin
14 jan 2007 kl. 02.14 skrev Kay Roepke: If I was you, I would make a filter that navigates an in heap object graph of all users and their connections using a breadth first (or perhaps even A*). I would essentially have the same problem with a in-memory graph: I cannot be sure of the Luc