You can check out the file format of Lucene's term dictionary here:
http://lucene.apache.org/java/docs/fileformats.html#Term%20Dictionary
That might give you some insight.
Lucene does not keep id's for terms that I can tell though...just for
documents...and then the id is really just an offset. Because you find
the term you want, and then an id/offset to get to the doc that contains
it, I don't see there being a mechanism for anything like the reverse.
You can access the Dictionary with:
http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/index/TermEnum.html
So you might count how many times you call next() to reach each term in
your doc to get an id. Would be pretty slow though. Others might have
good ideas for this though.
- Mark
Ilias Flaounas wrote:
I want to have IDs for the terms (words) not the documents!
Also, I need the same ID for a word if it appears in more than one documents.
Example:
Doc1: The sea is blue
Doc2: Sky is blue
For these two docs the dictionary would be [the]->1 [sea]->2 [is]->3
[blue]->4 [sky]->5
So I want to represent these docs by word-ids like this:
Doc1: 1 2 3 4
Doc2: 5 3 4
Is there a way to use Lucene for this? I mean Lucene stores an
internal dictionary. How can I access it?
Thank you,
Ilias
On 10/31/07, Mark Miller <[EMAIL PROTECTED]> wrote:
The id does change. You need to index your own "id" field with the document.
Ilias Flaounas wrote:
Dear experts,
I need to store and index a string of text into Lucene, and later I
want to get the Id of each term inside this string. Is it possible?
How can I do that?
I want a unique association, term (in my case a word) -> Id. I know,
that If I delete a document, the dictionary changes. Does the term id
change?
Thanks a lot
Ilias
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]