On 11/09/2012 09:41 AM, jake dsouza wrote:
Hello,
Has any one worked on making Lucene index and search versioned document
collections i.e any corpus with multiple versions of documents similar to
wikipedia or source code.
I am working on a project to index and search versioned collections while
keeping the index size minimum by taking into consideration differences in
the versions to minimize the size of the index .
Could some one direct me to any existing efforts to make Lucene work with
versions .
Hello Jake,
I never found the time, but it's still on my todo list, for a versioned
XML DBS[1]. But that is also my issue, I somehow would need the internal
buckets or nodes or whatever index structure it uses. For instance with
a PATRICIA trie it's very simple with my system, as I can just store the
nodes, which are then versioned (CoW-principle such that only changed
nodes are written, depending on the versioning strategy used (maybe also
a bunch of nodes in a "page" which holds a set of nodes). I never
figured out how todo this with Lucene, that's why I'm thinking about
implementing or simply integrating a PATRICIA-trie and enhance an XQuery
parser with fulltext capabilities.
However, _if_ it's possible with Lucene it would be great :-) That said
it's open source and maybe anyone would have some value and is motivated
to contribute, but that's just a wish ;-)
kind regards,
Johannes
[1] https://github.com/JohannesLichtenberger/sirix
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org