Daniel Noll wrote:
On Monday 17 March 2008 19:38:46 Michael McCandless wrote:
Well ... expungeDeletes() first forces a flush, at which point the
deletions are flushed as a .del file against the just flushed
segment. Still, if you call expungeDeletes after every flush
(commit) then it's only 1
On Monday 17 March 2008 19:38:46 Michael McCandless wrote:
> Well ... expungeDeletes() first forces a flush, at which point the
> deletions are flushed as a .del file against the just flushed
> segment. Still, if you call expungeDeletes after every flush
> (commit) then it's only 1 segment whose d
Daniel Noll wrote:
On Thursday 13 March 2008 19:46:20 Michael McCandless wrote:
But, when a normal merge of segments with deletions completes, your
docIDs will shift. In trunk we now explicitly compute the docID
shifting that happens after a merge, because we don't always flush
pending delete
On Thursday 13 March 2008 19:46:20 Michael McCandless wrote:
> But, when a normal merge of segments with deletions completes, your
> docIDs will shift. In trunk we now explicitly compute the docID
> shifting that happens after a merge, because we don't always flush
> pending deletes when flushing
Daniel Noll wrote:
For interest's sake I also timed fetching the document with no FieldSelector,
that takes around 410ms for the same documents. So there is still a big
benefit in using the field selector, it just isn't anywhere near enough to
get it close to the time it takes to retrieve th
On Thu, Mar 13, 2008 at 9:30 PM, Doron Cohen <[EMAIL PROTECTED]> wrote:
> Hi Daniel, LUCENE-1228 fixes a problem in IndexWriter.commit().
> I suspect this can be related to the problem you see though I am not sure.
> Could you try with the patch there?
> Thanks,
> Doron
Daniel, I was wrong about
Hi Daniel, LUCENE-1228 fixes a problem in IndexWriter.commit().
I suspect this can be related to the problem you see though I am not sure.
Could you try with the patch there?
Thanks,
Doron
On Thu, Mar 13, 2008 at 10:46 AM, Michael McCandless <
[EMAIL PROTECTED]> wrote:
>
> Daniel Noll wrote:
>
>
Daniel Noll wrote:
On Wednesday 12 March 2008 19:36:57 Michael McCandless wrote:
OK, I think very likely this is the issue: when IndexWriter hits an
exception while processing a document, the portion of the document
already indexed is left in the index, and then its docID is marked
for deletio
On Thursday 13 March 2008 00:42:59 Erick Erickson wrote:
> I certainly found that lazy loading changed my speed dramatically, but
> that was on a particularly field-heavy index.
>
> I wonder if TermEnum/TermDocs would be fast enough on an indexed
> (UN_TOKENIZED???) field for a unique id.
>
> Mostl
On Wednesday 12 March 2008 19:36:57 Michael McCandless wrote:
> OK, I think very likely this is the issue: when IndexWriter hits an
> exception while processing a document, the portion of the document
> already indexed is left in the index, and then its docID is marked
> for deletion. You can see
I certainly found that lazy loading changed my speed dramatically, but
that was on a particularly field-heavy index.
I wonder if TermEnum/TermDocs would be fast enough on an indexed
(UN_TOKENIZED???) field for a unique id.
Mostly, I'm hoping you'll try this and tell me if it works so I don't have
Daniel Noll wrote:
I have filtered out lines in the log which indicated an exception
adding the
document; these occur when our Reader throws an IOException and
there were so
many that it bloated the file.
OK, I think very likely this is the issue: when IndexWriter hits an
exception whil
On Wednesday 12 March 2008 10:20:12 Michael McCandless wrote:
> Oh, so you do not see the problem with SerialMergeScheduler but you
> do with ConcurrentMergeScheduler?
[...]
> Oh, there are no deletions? Then this is very strange. Is it
> optimize that messes up the docIDs? Or, is it when you
On Wednesday 12 March 2008 09:53:58 Erick Erickson wrote:
> But to me, it always seems...er...fraught to even *think* about relying
> on doc ids. I know you've been around the block with Lucene, but do you
> have a compelling reason to use the doc ID and not your own unique ID?
From memory it was
Daniel Noll wrote:
On Tuesday 11 March 2008 19:55:39 Michael McCandless wrote:
Hi Daniel,
2.3 should be no different from 2.2 in that docIDs only "shift" when
a merge of segments with deletions completes.
Could it be the ConcurrentMergeScheduler? Merges now run in the
background by default
But to me, it always seems...er...fraught to even *think* about relying
on doc ids. I know you've been around the block with Lucene, but do you
have a compelling reason to use the doc ID and not your own unique ID?
Best
Erick
On Tue, Mar 11, 2008 at 5:39 PM, Daniel Noll <[EMAIL PROTECTED]> wrote:
On Tuesday 11 March 2008 19:55:39 Michael McCandless wrote:
> Hi Daniel,
>
> 2.3 should be no different from 2.2 in that docIDs only "shift" when
> a merge of segments with deletions completes.
>
> Could it be the ConcurrentMergeScheduler? Merges now run in the
> background by default and commit w
Hi Daniel,
2.3 should be no different from 2.2 in that docIDs only "shift" when
a merge of segments with deletions completes.
Could it be the ConcurrentMergeScheduler? Merges now run in the
background by default and commit whenever they complete. You can get
back to the previous (block
DocIDs change whenever segments that had deletes pending, get merged.
So if you have no deletions, docIDs won't ever change.
Mike
Cam Bazz wrote:
Hello;
If no document is ever deleted nor updated from an index, will the
document
id change? under which circumstances will the document ids c
: The simple question - I have a document and I add it into index with
: TermVector support.
: How can I simply retrive the TermVector information for the document?
:
: TermFreqVector vector = reader.getTermFreqVector(document)?
: reader.delete(document);
: Etc..
Open an IndexReader,
Hi,
> From: Erik Hatcher [mailto:[EMAIL PROTECTED]
> For a domain-centric identifier, use a custom field to store
> (and index perhaps?) it. Lucene's Document id's are internal
> and not controllable.
Unfortunately Lucene contains API that strongly attached to internal id :(
For example -
Hi!
Is there any way to force the document id inside the lucene index, if
I have my own internal numbering scheme, it would be nice to have that
reflected inside the lucene index...anyway?
Simply put your ID as additional field to your document. You never
should rely on lucenes document id as
On Jun 24, 2005, at 3:08 PM, Yousef Ourabi wrote:
Hello:
Is there any way to force the document id inside the lucene index, if
I have my own internal numbering scheme, it would be nice to have that
reflected inside the lucene index...anyway?
For a domain-centric identifier, use a custom field
23 matches
Mail list logo