As far as I know, no changes are visible to an already-opened reader
so for the life of that reader document IDs are unchanged.
Erick
On 5/28/07, Carlos Pita <[EMAIL PROTECTED]> wrote:
Hi again,
On 5/24/07, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> Currently, a deleted doc is removed when th
Hi again,
On 5/24/07, Yonik Seeley <[EMAIL PROTECTED]> wrote:
Currently, a deleted doc is removed when the segment containing it is
involved in a segment merge. A merge could be triggered on any
addDocument(), making it difficult to incrementally update anything.
sorry but is the document
I see. Anyway I would update the array when adding a document, so my reader
would be closed then, and just a writer would be accessing the index.
Supposing that no merging is triggered (for this I'm choosing a big
mergeFactor and forcing optimization when a number of documents has been
added) the
Carlos Pita wrote:
Hi all,
Is there any guaranty that the maxDoc returned by a reader will be about
the
total number of indexed documents?
It struck me in this thread was that there may be a misunderstanding of the
relationship between numDocs/maxDoc and an IndexReader.
When an IndexReade
Nice, I will write the ids into a byte array with a DataOutputStream and
then marshal that array into a String with a UTF8 encoding. This way there
is no need for parsing or splitting, and the encoding is space efficient.
This marshaled String will be cached with a FieldCache. Thank you for your
s
: Mh, some of my fields are in fact multivaluated. But anyway, I could store
: them as a single string and split after retrieval.
: Will FieldCache work for the first search with some query or just for the
: successive ones, for which the fields are already cached?
The first time you access the ca
Mh, some of my fields are in fact multivaluated. But anyway, I could store
them as a single string and split after retrieval.
Will FieldCache work for the first search with some query or just for the
successive ones, for which the fields are already cached?
Cheers,
Carlos
On 5/24/07, Chris Hoste
: extremely fast. So I would really like to implement this approach. But I'm
: concerned about what Yonik remarked. I could use a large mergeFactor but
: anyway, just to be sure, is there a way to make the index inform my
: application of merging events?
this entire thread seems to be a discussio
I have done some benchmarks. Keeping things in an array makes the entire
search, including postprocessing from first to last id for a big result set,
extremely fast. So I would really like to implement this approach. But I'm
concerned about what Yonik remarked. I could use a large mergeFactor but
On 5/24/07, Carlos Pita <[EMAIL PROTECTED]> wrote:
Yes Erick, that's fine. But the fact is that I'm not sure whether the next
added document will have an id equal to maxDocs.
Yes. The highest docId will always be the last document added, and
docIds are never re-arranged with respect to each ot
Yes Erick, that's fine. But the fact is that I'm not sure whether the next
added document will have an id equal to maxDocs. If this is guaranteed, then
I will update the maxDocs slot of my extra data structure upon document
addition and get rid of the hits.id(0) slot upon document deletion. Then,
On 5/24/07, Carlos Pita <[EMAIL PROTECTED]> wrote:
I need it to update the in-mem structure upon more
fine-grained index changes. Any ideas?
Currently, a deleted doc is removed when the segment containing it is
involved in a segment merge. A merge could be triggered on any
addDocument(), mak
From the Javadoc for IndexReader.
Returns one greater than the largest possible document number. This may be
used to, e.g., determine how big to allocate an array which will have an
element for every document number in an index.
Isn't that what you're wondering about?
Erick
On 5/24/07, Ca
That's no problem, I can regenerate my entire extra data structure upon
periodic index optimization. That way the array size will be about the size
of the index. What I find more difficult is to know the id of the last
added/removed document. I need it to update the in-mem structure upon more
fin
Document IDs will be re-utilized, after, say, optimization.
One consequence of this is that optimization will change the IDs
of *existing* documents.
You're right, that numdocs may well be shorter than maxdocs.
That's what I get for reading quickly...
Best
Erick
On 5/24/07, Carlos Pita <[EMAIL
Carlos:
Answer to your last question: No, but if you look in JIRA, Karl Wettin has
written something that does have a notification mechanism that you are
describing.
Otis
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/ - Tag - Search - Share
--
No. It will always be at least as large as the total documents. But that
will also count deleted documents.
Do you mean that deleted document ids won't be reutilized, so the index
maxDoc will grow more and more with time? Isn't there any way to compress
the range? It seems strange to me, con
Why wouldn't numdocs serve?
Because the document id (which is the array index) would be in the range 0
... maxDoc and not 0 ... numDocs, wouldn't it?
Cheers,
Carlos
Best
Erick
The motivation of this question is that I want to associate some info to
> each document in the index, and in ord
See below...
On 5/24/07, Carlos Pita <[EMAIL PROTECTED]> wrote:
Hi all,
Is there any guaranty that the maxDoc returned by a reader will be about
the
total number of indexed documents?
No. It will always be at least as large as the total documents. But that
will also count deleted documents
19 matches
Mail list logo