ote:
> I'm new to Lucene and trying to understand the concept of unique document
> id, something like a primary key in databases like sql or sqlite etc.
> While searching, I came across this article:
> https://blog.mikemccandless.com/2014/05/choosing-which actually
>
I'm new to Lucene and trying to understand the concept of unique document
id, something like a primary key in databases like sql or sqlite etc.
While searching, I came across this article:
https://blog.mikemccandless.com/2014/05/choosing-which actually
fast-unique-identifier-uuid.html
&
able to gather that it checks term positions; therefore,
> I thought that it could be possible to find the term by the document id and
> the position. However, I was not able to get much further.
> `TermsEnum#ord()` (
> https://lucene.apache.org/core/7_2_1/core/org/apache/lucene/index/TermsE
ion of the PhraseQuery
I was able to gather that it checks term positions; therefore, I thought that
it could be possible to find the term by the document id and the position.
However, I was not able to get much further. `TermsEnum#ord()`
(https://lucene.apache.org/core/7_2_1/core/org/apache/lu
Daniel Noll wrote:
On Monday 17 March 2008 19:38:46 Michael McCandless wrote:
Well ... expungeDeletes() first forces a flush, at which point the
deletions are flushed as a .del file against the just flushed
segment. Still, if you call expungeDeletes after every flush
(commit) then it's only 1
On Monday 17 March 2008 19:38:46 Michael McCandless wrote:
> Well ... expungeDeletes() first forces a flush, at which point the
> deletions are flushed as a .del file against the just flushed
> segment. Still, if you call expungeDeletes after every flush
> (commit) then it's only 1 segment whose d
Daniel Noll wrote:
On Thursday 13 March 2008 19:46:20 Michael McCandless wrote:
But, when a normal merge of segments with deletions completes, your
docIDs will shift. In trunk we now explicitly compute the docID
shifting that happens after a merge, because we don't always flush
pending delete
On Thursday 13 March 2008 19:46:20 Michael McCandless wrote:
> But, when a normal merge of segments with deletions completes, your
> docIDs will shift. In trunk we now explicitly compute the docID
> shifting that happens after a merge, because we don't always flush
> pending deletes when flushing
Daniel Noll wrote:
For interest's sake I also timed fetching the document with no FieldSelector,
that takes around 410ms for the same documents. So there is still a big
benefit in using the field selector, it just isn't anywhere near enough to
get it close to the time it takes to retrieve th
On Thu, Mar 13, 2008 at 9:30 PM, Doron Cohen <[EMAIL PROTECTED]> wrote:
> Hi Daniel, LUCENE-1228 fixes a problem in IndexWriter.commit().
> I suspect this can be related to the problem you see though I am not sure.
> Could you try with the patch there?
> Thanks,
> Doron
Daniel, I was wrong about
ally an
> IOException)? I thought something was going wrong in retrieving or
> tokenizing the document.
>
> I don't think flush() helps because it just flushes the pending
> deletes as well?
>
> > - use ++ to determine the next document ID instead of
> > index.getWriter(
What exceptions are you actually hitting (is it really an
IOException)? I thought something was going wrong in retrieving or
tokenizing the document.
I don't think flush() helps because it just flushes the pending
deletes as well?
- use ++ to determine the next document ID
On Thursday 13 March 2008 00:42:59 Erick Erickson wrote:
> I certainly found that lazy loading changed my speed dramatically, but
> that was on a particularly field-heavy index.
>
> I wonder if TermEnum/TermDocs would be fast enough on an indexed
> (UN_TOKENIZED???) field for a unique id.
>
> Mostl
- use ++ to determine the next document ID instead of
index.getWriter().docCount() (out of sync after an error but fixes itself
on optimize().
- Use a field for a separate ID (slower later when reading the index)
- ???
Daniel
I certainly found that lazy loading changed my speed dramatically, but
that was on a particularly field-heavy index.
I wonder if TermEnum/TermDocs would be fast enough on an indexed
(UN_TOKENIZED???) field for a unique id.
Mostly, I'm hoping you'll try this and tell me if it works so I don't have
Daniel Noll wrote:
I have filtered out lines in the log which indicated an exception
adding the
document; these occur when our Reader throws an IOException and
there were so
many that it bloated the file.
OK, I think very likely this is the issue: when IndexWriter hits an
exception whil
On Wednesday 12 March 2008 10:20:12 Michael McCandless wrote:
> Oh, so you do not see the problem with SerialMergeScheduler but you
> do with ConcurrentMergeScheduler?
[...]
> Oh, there are no deletions? Then this is very strange. Is it
> optimize that messes up the docIDs? Or, is it when you
On Wednesday 12 March 2008 09:53:58 Erick Erickson wrote:
> But to me, it always seems...er...fraught to even *think* about relying
> on doc ids. I know you've been around the block with Lucene, but do you
> have a compelling reason to use the doc ID and not your own unique ID?
From memory it was
cations is relying on docIDs?
As far as that, we assume that if there are N documents in the
index then the
next document ID will be N (we determine this before adding the
document.)
As we're only doing this in a single thread and we never delete
documents,
this was previously safe.
Oh,
reorders the segments?
>
> > If it's not that ... can you provide more details about how your
> > applications is relying on docIDs?
>
> As far as that, we assume that if there are N documents in the index then
> the
> next document ID will be N (we de
pplications is relying on docIDs?
As far as that, we assume that if there are N documents in the index then the
next document ID will be N (we determine this before adding the document.)
As we're only doing this in a single
e previous (blocking) behavior by using
SerialMergeScheduler instead.
If it's not that ... can you provide more details about how your
applications is relying on docIDs?
Mike
Daniel Noll wrote:
Hi all.
We're using the document ID to associate extra information stored
outside
L
Hi all.
We're using the document ID to associate extra information stored outside
Lucene. Some of this information is being stored at load-time and some
afterwards; later on it turns out the information stored at load-time is
returning the wrong results when converting the database con
There was a thread on this exact issue.If you are using 2.3, the payload api
would help with that.
-John
On Thu, Feb 14, 2008 at 3:59 AM, Gauri Shankar <[EMAIL PROTECTED]>
wrote:
> Thanks a lot for both of you.
>
> yes, I am talking about internally assigned document id.
&
t 6:59 AM, Gauri Shankar <[EMAIL PROTECTED]>
wrote:
> Thanks a lot for both of you.
>
> yes, I am talking about internally assigned document id.
>
> Erick : I am already using the unique id into the index mapped to one of
> our
> DB's primary key to uniquely identify the docs
Thanks a lot for both of you.
yes, I am talking about internally assigned document id.
Erick : I am already using the unique id into the index mapped to one of our
DB's primary key to uniquely identify the docs from index. Now to get the
value of this unique field i need to call getDo
If you're referring to the internally-assigned document id, I don't think
there is a way. Assuming you're trying to assign one yourself or some
such.
>From all the discussions I've seen, I don't think there's even a faint
possibility that controlling this w
Add a field to your document.
document.add(new Field("id", idString));
Or something like that. (Don't have the doc handy right now).
Hope this helps.
Patrick
On Feb 9, 2008 7:38 AM, Gauri Shankar <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I would like to get the control over the docId field from my
Hi,
I would like to get the control over the docId field from my code. Can
anyone suggest some way for doing the same?
--
Warm Regards,
Gauri Shankar
DocIDs change whenever segments that had deletes pending, get merged.
So if you have no deletions, docIDs won't ever change.
Mike
Cam Bazz wrote:
Hello;
If no document is ever deleted nor updated from an index, will the
document
id change? under which circumstances will the documen
Hello;
If no document is ever deleted nor updated from an index, will the document
id change? under which circumstances will the document ids change, apart
from delete?
Best Regards,
-C.B.
ٌWell
Philipp and Ronnie Thank you very much indeed
--
Regards,
Mohammad
As long as there are no deletions, the ids will remain unchanged and
it is safe to use them outside.
But in a case where you delete some document, the resulting gap in the
document list will be filled during the next optimize (triggered
manually) or merge operation (may be triggered automatically
Thanks Philipp
2007/4/5, Philipp Nanz <[EMAIL PROTECTED]>:
> That *is* the actual id in the index. There is no other.
> You should be careful using it outside of Lucene though, because
> Lucene may rearrange the document ids during optimization for example.
>
> If you need an application id, ad
Ahh, now i know what you mean...
Forget the above :-)
Use result.id( i )
2007/4/5, Philipp Nanz <[EMAIL PROTECTED]>:
That *is* the actual id in the index. There is no other.
You should be careful using it outside of Lucene though, because
Lucene may rearrange the document ids during optimizati
It's in the FAQ:
http://wiki.apache.org/lucene-java/LuceneFAQ#head-e1de2630fe33fb6eb6733747a5bf870f600e1b4c
Mohammad Norouzi wrote:
but the question is, if I add, say, a document to my index, is lucene going
to re arrange the internal IDs? can't I trust them?
Would you tell me in exactly which
Hi
I need the id of the document that returned by Hits as a result of a query.
Hits result = searchable.find(myQuery);
now I need something like result.getId()
is there any way to get it?
Thanks so much
--
Regards,
Mohammad Norouzi
That *is* the actual id in the index. There is no other.
You should be careful using it outside of Lucene though, because
Lucene may rearrange the document ids during optimization for example.
If you need an application id, add it as an additional stored field to
each document and retrieve that.
sorry to correct my answer:
I need something like this result.doc( i ).getId();
this id from the result (the i ) is starting from 1 but I need the actual id
in the index.
On 4/5/07, Mohammad Norouzi <[EMAIL PROTECTED]> wrote:
Hi
I need the id of the document that returned by Hits as a result
Doron Cohen/Haifa/[EMAIL PROTECTED] wrote on 14/01/2007 23:04:27:
> David <[EMAIL PROTECTED]> wrote on 14/01/2007 20:08:05:
>
> > thanks, How do Lucene give each document an ID when the document is
> added?
> > Is the document ID unchanged until the document is d
David <[EMAIL PROTECTED]> wrote on 14/01/2007 20:08:05:
> thanks, How do Lucene give each document an ID when the document is
added?
> Is the document ID unchanged until the document is deleted?
>
Not exactly.
When the first doc is added, it is assigned id 0.
Next one assigned
thanks, How do Lucene give each document an ID when the document is added?
Is the document ID unchanged until the document is deleted?
2007/1/12, Otis Gospodnetic <[EMAIL PROTECTED]>:
David, please look at the Javadoc for IndexReader. I believe the API is
reader.document(int), where
How to retrieve the document by document ID?
Hi all:
How do Lucene give each document an ID when the document is added
and How do we retrieve a document by document ID? appreciate your help!
--
David
-
To unsubscr
Hi all:
How do Lucene give each document an ID when the document is added
and How do we retrieve a document by document ID? appreciate your help!
--
David
: Nevertheless, all values should be available during the calculation of the
overall
: score, which is done inside the Similarity class. Thus, collecting of these
should
: result into nearly no runtime overhead, its mainly a question about memory.
Similarity instances don't calculate any scores
lculation of the
overall
score, which is done inside the Similarity class. Thus, collecting of these
should
result into nearly no runtime overhead, its mainly a question about memory.
We have looked inside Similarity, and all is available except the current
document
id - so we have term score values b
How do I obtain the document id in order to delete from index?
Hi,
I've been reading Lucene documentation and I see there are two ways of
deleting a document from an index: by id and by term.
Supose I have an index with three fields: field1, field2 and field3, and I
want to delete all
Hi,
I've been reading Lucene documentation and I see there are two ways of deleting
a document from an index: by id and by term.
Supose I have an index with three fields: field1, field2 and field3, and I want
to delete all documents with field1=value1 and field2=value2.
I think I must use deleti
On Tue, 2006-05-09 at 13:53 -0400, varun sood wrote:
> Hi,
> I have "Doc. Id" of the document stored in the database. Now I want to
> query database on that "Doc. Id" (which will always return one document).
> How can I do this?
Are you aware that the document number created by Lucene is conside
Hi,
I have "Doc. Id" of the document stored in the database. Now I want to
query database on that "Doc. Id" (which will always return one document).
How can I do this?
To avoid confusion, I am talking about the "Doc. Id" which Lucene
automatically creates for every document and hence is unique fo
> a better way to discover the id of the document i just added than
> > > docCount() ?
> >
> > When building a new index by strictly adding documents, you could
> > keep a zero-based counter which would reflect document id at that
> > time. They are simply in
ument i just added than
> > docCount() ?
>
> When building a new index by strictly adding documents, you could
> keep a zero-based counter which would reflect document id at that
> time. They are simply in ascending order.
>
> Erik
>
>
> ---
building a new index by strictly adding documents, you could
keep a zero-based counter which would reflect document id at that
time. They are simply in ascending order.
Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
I've got an index which I rebuild each time and don't do any deletes
until the end, so doc ids shouldn't change... at index time, is there
a better way to discover the id of the document i just added than
docCount() ?
--
___
Chris Fraschetti
e [EMAIL
: The simple question - I have a document and I add it into index with
: TermVector support.
: How can I simply retrive the TermVector information for the document?
:
: TermFreqVector vector = reader.getTermFreqVector(document)?
: reader.delete(document);
: Etc..
Open an IndexReader,
Hi,
> From: Erik Hatcher [mailto:[EMAIL PROTECTED]
> For a domain-centric identifier, use a custom field to store
> (and index perhaps?) it. Lucene's Document id's are internal
> and not controllable.
Unfortunately Lucene contains API that strongly attached to internal id :(
For example -
Hi!
Is there any way to force the document id inside the lucene index, if
I have my own internal numbering scheme, it would be nice to have that
reflected inside the lucene index...anyway?
Simply put your ID as additional field to your document. You never
should rely on lucenes document id as
On Jun 24, 2005, at 3:08 PM, Yousef Ourabi wrote:
Hello:
Is there any way to force the document id inside the lucene index, if
I have my own internal numbering scheme, it would be nice to have that
reflected inside the lucene index...anyway?
For a domain-centric identifier, use a custom
Hello:
Is there any way to force the document id inside the lucene index, if
I have my own internal numbering scheme, it would be nice to have that
reflected inside the lucene index...anyway?
IF not is the document ID created on creation or on addition to the
index? And is there a way to retrieve
59 matches
Mail list logo