Re: Understanding Document ID (Lucene 10.0.0)

2024-10-25 Thread Michael Froh
ote: > I'm new to Lucene and trying to understand the concept of unique document > id, something like a primary key in databases like sql or sqlite etc. > While searching, I came across this article: > https://blog.mikemccandless.com/2014/05/choosing-which actually >

Understanding Document ID (Lucene 10.0.0)

2024-10-25 Thread Prashant Saxena
I'm new to Lucene and trying to understand the concept of unique document id, something like a primary key in databases like sql or sqlite etc. While searching, I came across this article: https://blog.mikemccandless.com/2014/05/choosing-which actually fast-unique-identifier-uuid.html &

Re: Question: find term by position and document id

2018-04-05 Thread Adrien Grand
able to gather that it checks term positions; therefore, > I thought that it could be possible to find the term by the document id and > the position. However, I was not able to get much further. > `TermsEnum#ord()` ( > https://lucene.apache.org/core/7_2_1/core/org/apache/lucene/index/TermsE

Question: find term by position and document id

2018-04-05 Thread Adam Hornacek
ion of the PhraseQuery I was able to gather that it checks term positions; therefore, I thought that it could be possible to find the term by the document id and the position. However, I was not able to get much further. `TermsEnum#ord()` (https://lucene.apache.org/core/7_2_1/core/org/apache/lu

Re: Document ID shuffling under 2.3.x (on merge?)

2008-03-17 Thread Michael McCandless
Daniel Noll wrote: On Monday 17 March 2008 19:38:46 Michael McCandless wrote: Well ... expungeDeletes() first forces a flush, at which point the deletions are flushed as a .del file against the just flushed segment. Still, if you call expungeDeletes after every flush (commit) then it's only 1

Re: Document ID shuffling under 2.3.x (on merge?)

2008-03-17 Thread Daniel Noll
On Monday 17 March 2008 19:38:46 Michael McCandless wrote: > Well ... expungeDeletes() first forces a flush, at which point the > deletions are flushed as a .del file against the just flushed > segment. Still, if you call expungeDeletes after every flush > (commit) then it's only 1 segment whose d

Re: Document ID shuffling under 2.3.x (on merge?)

2008-03-17 Thread Michael McCandless
Daniel Noll wrote: On Thursday 13 March 2008 19:46:20 Michael McCandless wrote: But, when a normal merge of segments with deletions completes, your docIDs will shift. In trunk we now explicitly compute the docID shifting that happens after a merge, because we don't always flush pending delete

Re: Document ID shuffling under 2.3.x (on merge?)

2008-03-16 Thread Daniel Noll
On Thursday 13 March 2008 19:46:20 Michael McCandless wrote: > But, when a normal merge of segments with deletions completes, your > docIDs will shift. In trunk we now explicitly compute the docID > shifting that happens after a merge, because we don't always flush > pending deletes when flushing

Re: Document ID shuffling under 2.3.x (on merge?)

2008-03-13 Thread Michael Busch
Daniel Noll wrote: For interest's sake I also timed fetching the document with no FieldSelector, that takes around 410ms for the same documents. So there is still a big benefit in using the field selector, it just isn't anywhere near enough to get it close to the time it takes to retrieve th

Re: Document ID shuffling under 2.3.x (on merge?)

2008-03-13 Thread Doron Cohen
On Thu, Mar 13, 2008 at 9:30 PM, Doron Cohen <[EMAIL PROTECTED]> wrote: > Hi Daniel, LUCENE-1228 fixes a problem in IndexWriter.commit(). > I suspect this can be related to the problem you see though I am not sure. > Could you try with the patch there? > Thanks, > Doron Daniel, I was wrong about

Re: Document ID shuffling under 2.3.x (on merge?)

2008-03-13 Thread Doron Cohen
ally an > IOException)? I thought something was going wrong in retrieving or > tokenizing the document. > > I don't think flush() helps because it just flushes the pending > deletes as well? > > > - use ++ to determine the next document ID instead of > > index.getWriter(

Re: Document ID shuffling under 2.3.x (on merge?)

2008-03-13 Thread Michael McCandless
What exceptions are you actually hitting (is it really an IOException)? I thought something was going wrong in retrieving or tokenizing the document. I don't think flush() helps because it just flushes the pending deletes as well? - use ++ to determine the next document ID

Re: Document ID shuffling under 2.3.x (on merge?)

2008-03-12 Thread Daniel Noll
On Thursday 13 March 2008 00:42:59 Erick Erickson wrote: > I certainly found that lazy loading changed my speed dramatically, but > that was on a particularly field-heavy index. > > I wonder if TermEnum/TermDocs would be fast enough on an indexed > (UN_TOKENIZED???) field for a unique id. > > Mostl

Re: Document ID shuffling under 2.3.x (on merge?)

2008-03-12 Thread Daniel Noll
- use ++ to determine the next document ID instead of index.getWriter().docCount() (out of sync after an error but fixes itself on optimize(). - Use a field for a separate ID (slower later when reading the index) - ??? Daniel

Re: Document ID shuffling under 2.3.x (on merge?)

2008-03-12 Thread Erick Erickson
I certainly found that lazy loading changed my speed dramatically, but that was on a particularly field-heavy index. I wonder if TermEnum/TermDocs would be fast enough on an indexed (UN_TOKENIZED???) field for a unique id. Mostly, I'm hoping you'll try this and tell me if it works so I don't have

Re: Document ID shuffling under 2.3.x (on merge?)

2008-03-12 Thread Michael McCandless
Daniel Noll wrote: I have filtered out lines in the log which indicated an exception adding the document; these occur when our Reader throws an IOException and there were so many that it bloated the file. OK, I think very likely this is the issue: when IndexWriter hits an exception whil

Re: Document ID shuffling under 2.3.x (on merge?)

2008-03-11 Thread Daniel Noll
On Wednesday 12 March 2008 10:20:12 Michael McCandless wrote: > Oh, so you do not see the problem with SerialMergeScheduler but you > do with ConcurrentMergeScheduler? [...] > Oh, there are no deletions?  Then this is very strange.  Is it   > optimize that messes up the docIDs?  Or, is it when you

Re: Document ID shuffling under 2.3.x (on merge?)

2008-03-11 Thread Daniel Noll
On Wednesday 12 March 2008 09:53:58 Erick Erickson wrote: > But to me, it always seems...er...fraught to even *think* about relying > on doc ids. I know you've been around the block with Lucene, but do you > have a compelling reason to use the doc ID and not your own unique ID? From memory it was

Re: Document ID shuffling under 2.3.x (on merge?)

2008-03-11 Thread Michael McCandless
cations is relying on docIDs? As far as that, we assume that if there are N documents in the index then the next document ID will be N (we determine this before adding the document.) As we're only doing this in a single thread and we never delete documents, this was previously safe. Oh,

Re: Document ID shuffling under 2.3.x (on merge?)

2008-03-11 Thread Erick Erickson
reorders the segments? > > > If it's not that ... can you provide more details about how your > > applications is relying on docIDs? > > As far as that, we assume that if there are N documents in the index then > the > next document ID will be N (we de

Re: Document ID shuffling under 2.3.x (on merge?)

2008-03-11 Thread Daniel Noll
pplications is relying on docIDs? As far as that, we assume that if there are N documents in the index then the next document ID will be N (we determine this before adding the document.) As we're only doing this in a single

Re: Document ID shuffling under 2.3.x (on merge?)

2008-03-11 Thread Michael McCandless
e previous (blocking) behavior by using SerialMergeScheduler instead. If it's not that ... can you provide more details about how your applications is relying on docIDs? Mike Daniel Noll wrote: Hi all. We're using the document ID to associate extra information stored outside L

Document ID shuffling under 2.3.x (on merge?)

2008-03-10 Thread Daniel Noll
Hi all. We're using the document ID to associate extra information stored outside Lucene. Some of this information is being stored at load-time and some afterwards; later on it turns out the information stored at load-time is returning the wrong results when converting the database con

Re: how to get the programmatic control over index's document id

2008-02-14 Thread John Wang
There was a thread on this exact issue.If you are using 2.3, the payload api would help with that. -John On Thu, Feb 14, 2008 at 3:59 AM, Gauri Shankar <[EMAIL PROTECTED]> wrote: > Thanks a lot for both of you. > > yes, I am talking about internally assigned document id. &

Re: how to get the programmatic control over index's document id

2008-02-14 Thread Erick Erickson
t 6:59 AM, Gauri Shankar <[EMAIL PROTECTED]> wrote: > Thanks a lot for both of you. > > yes, I am talking about internally assigned document id. > > Erick : I am already using the unique id into the index mapped to one of > our > DB's primary key to uniquely identify the docs

Re: how to get the programmatic control over index's document id

2008-02-14 Thread Gauri Shankar
Thanks a lot for both of you. yes, I am talking about internally assigned document id. Erick : I am already using the unique id into the index mapped to one of our DB's primary key to uniquely identify the docs from index. Now to get the value of this unique field i need to call getDo

Re: how to get the programmatic control over index's document id

2008-02-09 Thread Erick Erickson
If you're referring to the internally-assigned document id, I don't think there is a way. Assuming you're trying to assign one yourself or some such. >From all the discussions I've seen, I don't think there's even a faint possibility that controlling this w

Re: how to get the programmatic control over index's document id

2008-02-09 Thread Patrick Turcotte
Add a field to your document. document.add(new Field("id", idString)); Or something like that. (Don't have the doc handy right now). Hope this helps. Patrick On Feb 9, 2008 7:38 AM, Gauri Shankar <[EMAIL PROTECTED]> wrote: > Hi, > > I would like to get the control over the docId field from my

how to get the programmatic control over index's document id

2008-02-09 Thread Gauri Shankar
Hi, I would like to get the control over the docId field from my code. Can anyone suggest some way for doing the same? -- Warm Regards, Gauri Shankar

Re: document Id question, again

2008-01-31 Thread Michael McCandless
DocIDs change whenever segments that had deletes pending, get merged. So if you have no deletions, docIDs won't ever change. Mike Cam Bazz wrote: Hello; If no document is ever deleted nor updated from an index, will the document id change? under which circumstances will the documen

document Id question, again

2008-01-31 Thread Cam Bazz
Hello; If no document is ever deleted nor updated from an index, will the document id change? under which circumstances will the document ids change, apart from delete? Best Regards, -C.B.

Re: I need the internal lucene's document id from Hits

2007-04-05 Thread Mohammad Norouzi
ٌWell Philipp and Ronnie Thank you very much indeed -- Regards, Mohammad

Re: I need the internal lucene's document id from Hits

2007-04-05 Thread Philipp Nanz
As long as there are no deletions, the ids will remain unchanged and it is safe to use them outside. But in a case where you delete some document, the resulting gap in the document list will be filled during the next optimize (triggered manually) or merge operation (may be triggered automatically

Re: I need the internal lucene's document id from Hits

2007-04-05 Thread Mohammad Norouzi
Thanks Philipp 2007/4/5, Philipp Nanz <[EMAIL PROTECTED]>: > That *is* the actual id in the index. There is no other. > You should be careful using it outside of Lucene though, because > Lucene may rearrange the document ids during optimization for example. > > If you need an application id, ad

Re: I need the internal lucene's document id from Hits

2007-04-05 Thread Philipp Nanz
Ahh, now i know what you mean... Forget the above :-) Use result.id( i ) 2007/4/5, Philipp Nanz <[EMAIL PROTECTED]>: That *is* the actual id in the index. There is no other. You should be careful using it outside of Lucene though, because Lucene may rearrange the document ids during optimizati

Re: I need the internal lucene's document id from Hits

2007-04-05 Thread Ronnie Kolehmainen
It's in the FAQ: http://wiki.apache.org/lucene-java/LuceneFAQ#head-e1de2630fe33fb6eb6733747a5bf870f600e1b4c Mohammad Norouzi wrote: but the question is, if I add, say, a document to my index, is lucene going to re arrange the internal IDs? can't I trust them? Would you tell me in exactly which

I need the internal lucene's document id from Hits

2007-04-05 Thread Mohammad Norouzi
Hi I need the id of the document that returned by Hits as a result of a query. Hits result = searchable.find(myQuery); now I need something like result.getId() is there any way to get it? Thanks so much -- Regards, Mohammad Norouzi

Re: I need the internal lucene's document id from Hits

2007-04-05 Thread Philipp Nanz
That *is* the actual id in the index. There is no other. You should be careful using it outside of Lucene though, because Lucene may rearrange the document ids during optimization for example. If you need an application id, add it as an additional stored field to each document and retrieve that.

Re: I need the internal lucene's document id from Hits

2007-04-05 Thread Mohammad Norouzi
sorry to correct my answer: I need something like this result.doc( i ).getId(); this id from the result (the i ) is starting from 1 but I need the actual id in the index. On 4/5/07, Mohammad Norouzi <[EMAIL PROTECTED]> wrote: Hi I need the id of the document that returned by Hits as a result

Re: How to retrieve the document by document ID?

2007-01-15 Thread Doron Cohen
Doron Cohen/Haifa/[EMAIL PROTECTED] wrote on 14/01/2007 23:04:27: > David <[EMAIL PROTECTED]> wrote on 14/01/2007 20:08:05: > > > thanks, How do Lucene give each document an ID when the document is > added? > > Is the document ID unchanged until the document is d

Re: How to retrieve the document by document ID?

2007-01-14 Thread Doron Cohen
David <[EMAIL PROTECTED]> wrote on 14/01/2007 20:08:05: > thanks, How do Lucene give each document an ID when the document is added? > Is the document ID unchanged until the document is deleted? > Not exactly. When the first doc is added, it is assigned id 0. Next one assigned

Re: How to retrieve the document by document ID?

2007-01-14 Thread David
thanks, How do Lucene give each document an ID when the document is added? Is the document ID unchanged until the document is deleted? 2007/1/12, Otis Gospodnetic <[EMAIL PROTECTED]>: David, please look at the Javadoc for IndexReader. I believe the API is reader.document(int), where

Re: How to retrieve the document by document ID?

2007-01-12 Thread Otis Gospodnetic
How to retrieve the document by document ID? Hi all: How do Lucene give each document an ID when the document is added and How do we retrieve a document by document ID? appreciate your help! -- David - To unsubscr

How to retrieve the document by document ID?

2007-01-12 Thread David
Hi all: How do Lucene give each document an ID when the document is added and How do we retrieve a document by document ID? appreciate your help! -- David

Re: result explanations / how to get the current document id inside a similarity subclass

2006-11-10 Thread Chris Hostetter
: Nevertheless, all values should be available during the calculation of the overall : score, which is done inside the Similarity class. Thus, collecting of these should : result into nearly no runtime overhead, its mainly a question about memory. Similarity instances don't calculate any scores

result explanations / how to get the current document id inside a similarity subclass

2006-11-10 Thread duiduder
lculation of the overall score, which is done inside the Similarity class. Thus, collecting of these should result into nearly no runtime overhead, its mainly a question about memory. We have looked inside Similarity, and all is available except the current document id - so we have term score values b

Re: How do I obtain the document id in order to delete from index?

2006-10-30 Thread Enrique Lamas
How do I obtain the document id in order to delete from index? Hi, I've been reading Lucene documentation and I see there are two ways of deleting a document from an index: by id and by term. Supose I have an index with three fields: field1, field2 and field3, and I want to delete all

How do I obtain the document id in order to delete from index?

2006-10-30 Thread Enrique Lamas
Hi, I've been reading Lucene documentation and I see there are two ways of deleting a document from an index: by id and by term. Supose I have an index with three fields: field1, field2 and field3, and I want to delete all documents with field1=value1 and field2=value2. I think I must use deleti

Re: Retrieving field or Document using document id.

2006-05-09 Thread karl wettin
On Tue, 2006-05-09 at 13:53 -0400, varun sood wrote: > Hi, > I have "Doc. Id" of the document stored in the database. Now I want to > query database on that "Doc. Id" (which will always return one document). > How can I do this? Are you aware that the document number created by Lucene is conside

Retrieving field or Document using document id.

2006-05-09 Thread varun sood
Hi, I have "Doc. Id" of the document stored in the database. Now I want to query database on that "Doc. Id" (which will always return one document). How can I do this? To avoid confusion, I am talking about the "Doc. Id" which Lucene automatically creates for every document and hence is unique fo

Re: indexed document id

2005-08-01 Thread Otis Gospodnetic
> a better way to discover the id of the document i just added than > > > docCount() ? > > > > When building a new index by strictly adding documents, you could > > keep a zero-based counter which would reflect document id at that > > time. They are simply in

Re: indexed document id

2005-08-01 Thread Chris Fraschetti
ument i just added than > > docCount() ? > > When building a new index by strictly adding documents, you could > keep a zero-based counter which would reflect document id at that > time. They are simply in ascending order. > > Erik > > > ---

Re: indexed document id

2005-07-29 Thread Erik Hatcher
building a new index by strictly adding documents, you could keep a zero-based counter which would reflect document id at that time. They are simply in ascending order. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED]

indexed document id

2005-07-29 Thread Chris Fraschetti
I've got an index which I rebuild each time and don't do any deletes until the end, so doc ids shouldn't change... at index time, is there a better way to discover the id of the document i just added than docCount() ? -- ___ Chris Fraschetti e [EMAIL

RE: Document ID

2005-06-25 Thread Chris Hostetter
: The simple question - I have a document and I add it into index with : TermVector support. : How can I simply retrive the TermVector information for the document? : : TermFreqVector vector = reader.getTermFreqVector(document)? : reader.delete(document); : Etc.. Open an IndexReader,

RE: Document ID

2005-06-25 Thread Pasha Bizhan
Hi, > From: Erik Hatcher [mailto:[EMAIL PROTECTED] > For a domain-centric identifier, use a custom field to store > (and index perhaps?) it. Lucene's Document id's are internal > and not controllable. Unfortunately Lucene contains API that strongly attached to internal id :( For example -

Re: Document ID

2005-06-24 Thread Mario Ivankovits
Hi! Is there any way to force the document id inside the lucene index, if I have my own internal numbering scheme, it would be nice to have that reflected inside the lucene index...anyway? Simply put your ID as additional field to your document. You never should rely on lucenes document id as

Re: Document ID

2005-06-24 Thread Erik Hatcher
On Jun 24, 2005, at 3:08 PM, Yousef Ourabi wrote: Hello: Is there any way to force the document id inside the lucene index, if I have my own internal numbering scheme, it would be nice to have that reflected inside the lucene index...anyway? For a domain-centric identifier, use a custom

Document ID

2005-06-24 Thread Yousef Ourabi
Hello: Is there any way to force the document id inside the lucene index, if I have my own internal numbering scheme, it would be nice to have that reflected inside the lucene index...anyway? IF not is the document ID created on creation or on addition to the index? And is there a way to retrieve