On 11/09/2012 09:41 AM, jake dsouza wrote:
Hello,

Has any one worked on making Lucene index and search versioned document
collections i.e any corpus with multiple versions of documents similar to
wikipedia or source code.
I am working on a project to index and search versioned collections while
keeping the index size minimum by taking into consideration differences in
the versions to minimize the size of the index .

Could some one direct me to any existing efforts to make Lucene work with
versions .

Hello Jake,

I never found the time, but it's still on my todo list, for a versioned XML DBS[1]. But that is also my issue, I somehow would need the internal buckets or nodes or whatever index structure it uses. For instance with a PATRICIA trie it's very simple with my system, as I can just store the nodes, which are then versioned (CoW-principle such that only changed nodes are written, depending on the versioning strategy used (maybe also a bunch of nodes in a "page" which holds a set of nodes). I never figured out how todo this with Lucene, that's why I'm thinking about implementing or simply integrating a PATRICIA-trie and enhance an XQuery parser with fulltext capabilities.

However, _if_ it's possible with Lucene it would be great :-) That said it's open source and maybe anyone would have some value and is motivated to contribute, but that's just a wish ;-)

kind regards,
Johannes

[1] https://github.com/JohannesLichtenberger/sirix


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to