JDBM is surely a better way than in memory hash map.
But I feel since all previous documents are already in the index, although
not closed yet, there should be a way to read all previous terms.
It's ok to use additional data structure, like JDBM or hash map, to
duplicate the terms, in order to look
I use JDBM store document's key ID.
2008/12/30 Chris Lu
> Otis, thanks for the pointer.
> I think the question can be:
>
> How to access TermEnum or TermInfos during indexing.
>
> If this is possible, things would be easier.
>
> --
> Chris Lu
> -
> Instant Scalable Full-
Otis, thanks for the pointer.
I think the question can be:
How to access TermEnum or TermInfos during indexing.
If this is possible, things would be easier.
--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: h
Chris,
Mark Miller & Co. are working on (Near) Duplicate Detection. I think the work
is in Solr's JIRA, but some of it might be applicable to Lucene.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> From: Chris Lu
> To: "java-user@lucene.apach