Re: Indexing and searching across versioned document collections

Johannes.Lichtenberger Fri, 09 Nov 2012 00:54:15 -0800

On 11/09/2012 09:41 AM, jake dsouza wrote:

Hello,


Has any one worked on making Lucene index and search versioned document
collections i.e any corpus with multiple versions of documents similar to
wikipedia or source code.
I am working on a project to index and search versioned collections while
keeping the index size minimum by taking into consideration differences in
the versions to minimize the size of the index .

Could some one direct me to any existing efforts to make Lucene work with
versions .


Hello Jake,

I never found the time, but it's still on my todo list, for a versionedXML DBS[1]. But that is also my issue, I somehow would need the internalbuckets or nodes or whatever index structure it uses. For instance witha PATRICIA trie it's very simple with my system, as I can just store thenodes, which are then versioned (CoW-principle such that only changednodes are written, depending on the versioning strategy used (maybe alsoa bunch of nodes in a "page" which holds a set of nodes). I neverfigured out how todo this with Lucene, that's why I'm thinking aboutimplementing or simply integrating a PATRICIA-trie and enhance an XQueryparser with fulltext capabilities.

However, _if_ it's possible with Lucene it would be great :-) That saidit's open source and maybe anyone would have some value and is motivatedto contribute, but that's just a wish ;-)


kind regards,
Johannes

[1] https://github.com/JohannesLichtenberger/sirix


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Indexing and searching across versioned document collections

Reply via email to