You can check out the file format of Lucene's term dictionary here: http://lucene.apache.org/java/docs/fileformats.html#Term%20Dictionary

That might give you some insight.

Lucene does not keep id's for terms that I can tell though...just for documents...and then the id is really just an offset. Because you find the term you want, and then an id/offset to get to the doc that contains it, I don't see there being a mechanism for anything like the reverse.

You can access the Dictionary with: http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/index/TermEnum.html

So you might count how many times you call next() to reach each term in your doc to get an id. Would be pretty slow though. Others might have good ideas for this though.

- Mark

Ilias Flaounas wrote:
I want to have IDs for the terms (words) not the documents!
Also, I need the same ID for a word if it appears in more than one documents.

Example:
Doc1: The sea is blue
Doc2: Sky is blue

For these two docs the dictionary would be [the]->1 [sea]->2 [is]->3
[blue]->4 [sky]->5

So I want to represent these docs by word-ids like this:
Doc1: 1 2 3 4
Doc2: 5 3 4

Is there a way to use Lucene for this? I mean Lucene stores an
internal dictionary. How can I access it?

Thank you,
Ilias


On 10/31/07, Mark Miller <[EMAIL PROTECTED]> wrote:
The id does change. You need to index your own "id" field with the document.


Ilias Flaounas wrote:
Dear experts,

I need to store and index a string of text into Lucene, and later I
want to get the Id of each term inside this string. Is it possible?
How can I do that?

I want a unique association, term (in my case a word) -> Id. I know,
that If I delete a document, the dictionary changes. Does the term id
change?


Thanks a lot
Ilias

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to