Re: maxDoc and arrays

2007-05-29 Thread Erick Erickson
As far as I know, no changes are visible to an already-opened reader so for the life of that reader document IDs are unchanged. Erick On 5/28/07, Carlos Pita <[EMAIL PROTECTED]> wrote: Hi again, On 5/24/07, Yonik Seeley <[EMAIL PROTECTED]> wrote: > Currently, a deleted doc is removed when th

Re: maxDoc and arrays

2007-05-28 Thread Carlos Pita
Hi again, On 5/24/07, Yonik Seeley <[EMAIL PROTECTED]> wrote: Currently, a deleted doc is removed when the segment containing it is involved in a segment merge. A merge could be triggered on any addDocument(), making it difficult to incrementally update anything. sorry but is the document

Re: maxDoc and arrays

2007-05-24 Thread Carlos Pita
I see. Anyway I would update the array when adding a document, so my reader would be closed then, and just a writer would be accessing the index. Supposing that no merging is triggered (for this I'm choosing a big mergeFactor and forcing optimization when a number of documents has been added) the

Re: maxDoc and arrays

2007-05-24 Thread Antony Bowesman
Carlos Pita wrote: Hi all, Is there any guaranty that the maxDoc returned by a reader will be about the total number of indexed documents? It struck me in this thread was that there may be a misunderstanding of the relationship between numDocs/maxDoc and an IndexReader. When an IndexReade

Re: maxDoc and arrays

2007-05-24 Thread Carlos Pita
Nice, I will write the ids into a byte array with a DataOutputStream and then marshal that array into a String with a UTF8 encoding. This way there is no need for parsing or splitting, and the encoding is space efficient. This marshaled String will be cached with a FieldCache. Thank you for your s

Re: maxDoc and arrays

2007-05-24 Thread Chris Hostetter
: Mh, some of my fields are in fact multivaluated. But anyway, I could store : them as a single string and split after retrieval. : Will FieldCache work for the first search with some query or just for the : successive ones, for which the fields are already cached? The first time you access the ca

Re: maxDoc and arrays

2007-05-24 Thread Carlos Pita
Mh, some of my fields are in fact multivaluated. But anyway, I could store them as a single string and split after retrieval. Will FieldCache work for the first search with some query or just for the successive ones, for which the fields are already cached? Cheers, Carlos On 5/24/07, Chris Hoste

Re: maxDoc and arrays

2007-05-24 Thread Chris Hostetter
: extremely fast. So I would really like to implement this approach. But I'm : concerned about what Yonik remarked. I could use a large mergeFactor but : anyway, just to be sure, is there a way to make the index inform my : application of merging events? this entire thread seems to be a discussio

Re: maxDoc and arrays

2007-05-24 Thread Carlos Pita
I have done some benchmarks. Keeping things in an array makes the entire search, including postprocessing from first to last id for a big result set, extremely fast. So I would really like to implement this approach. But I'm concerned about what Yonik remarked. I could use a large mergeFactor but

Re: maxDoc and arrays

2007-05-24 Thread Yonik Seeley
On 5/24/07, Carlos Pita <[EMAIL PROTECTED]> wrote: Yes Erick, that's fine. But the fact is that I'm not sure whether the next added document will have an id equal to maxDocs. Yes. The highest docId will always be the last document added, and docIds are never re-arranged with respect to each ot

Re: maxDoc and arrays

2007-05-24 Thread Carlos Pita
Yes Erick, that's fine. But the fact is that I'm not sure whether the next added document will have an id equal to maxDocs. If this is guaranteed, then I will update the maxDocs slot of my extra data structure upon document addition and get rid of the hits.id(0) slot upon document deletion. Then,

Re: maxDoc and arrays

2007-05-24 Thread Yonik Seeley
On 5/24/07, Carlos Pita <[EMAIL PROTECTED]> wrote: I need it to update the in-mem structure upon more fine-grained index changes. Any ideas? Currently, a deleted doc is removed when the segment containing it is involved in a segment merge. A merge could be triggered on any addDocument(), mak

Re: maxDoc and arrays

2007-05-24 Thread Erick Erickson
From the Javadoc for IndexReader. Returns one greater than the largest possible document number. This may be used to, e.g., determine how big to allocate an array which will have an element for every document number in an index. Isn't that what you're wondering about? Erick On 5/24/07, Ca

Re: maxDoc and arrays

2007-05-24 Thread Carlos Pita
That's no problem, I can regenerate my entire extra data structure upon periodic index optimization. That way the array size will be about the size of the index. What I find more difficult is to know the id of the last added/removed document. I need it to update the in-mem structure upon more fin

Re: maxDoc and arrays

2007-05-24 Thread Erick Erickson
Document IDs will be re-utilized, after, say, optimization. One consequence of this is that optimization will change the IDs of *existing* documents. You're right, that numdocs may well be shorter than maxdocs. That's what I get for reading quickly... Best Erick On 5/24/07, Carlos Pita <[EMAIL

Re: maxDoc and arrays

2007-05-24 Thread Otis Gospodnetic
- Original Message From: Carlos Pita <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Thursday, May 24, 2007 12:41:11 PM Subject: maxDoc and arrays Hi all, Is there any guaranty that the maxDoc returned by a reader will be about the total number of indexed documents

Re: maxDoc and arrays

2007-05-24 Thread Carlos Pita
No. It will always be at least as large as the total documents. But that will also count deleted documents. Do you mean that deleted document ids won't be reutilized, so the index maxDoc will grow more and more with time? Isn't there any way to compress the range? It seems strange to me, con

Re: maxDoc and arrays

2007-05-24 Thread Carlos Pita
Why wouldn't numdocs serve? Because the document id (which is the array index) would be in the range 0 ... maxDoc and not 0 ... numDocs, wouldn't it? Cheers, Carlos Best Erick The motivation of this question is that I want to associate some info to > each document in the index, and in ord

Re: maxDoc and arrays

2007-05-24 Thread Erick Erickson
See below... On 5/24/07, Carlos Pita <[EMAIL PROTECTED]> wrote: Hi all, Is there any guaranty that the maxDoc returned by a reader will be about the total number of indexed documents? No. It will always be at least as large as the total documents. But that will also count deleted documents

maxDoc and arrays

2007-05-24 Thread Carlos Pita
Hi all, Is there any guaranty that the maxDoc returned by a reader will be about the total number of indexed documents? The motivation of this question is that I want to associate some info to each document in the index, and in order to access this additional data in O(1) I would like to do this