-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi everyone,
I searched the list archives, but couldn't find a question that closely matches mine. The project I'm working on is designed to allow searching a distributed collection of data repositories. Currently, we index each repository to build a central Lucene index. This works ok, but for practical (the central index is getting very large) and architectural (decentralization is a design goal) reasons, we'd like to distribute the index. In the past, we had basic federation system in place: when a user submitted a query, the query was broadcast to each data repository, which had its own independent Lucene index. Results from each repo were aggregated in reverse order. The problem was, of course, that since each index was constructed independently of all the others, and documents are distributed in the repos unevenly, it was impossible to rank the results from all the indices in a meaningful way. We basically punted and interleaved results, which didn't gave a bad user experience, hence the temporary switch to a central index. So, what options exist for searching distributed collections of Lucene indices and ranking results meaningfully? Katta seems promising, but I don't know enough about it yet. It also seems to want to open its own ports for RPC. I'd prefer something that could tunnel over HTTP to minimize firewall drama. (We will have 10s and then 100s of data repos running in separate locations.) We're also considering a home-grown scheme involving normalizing the denominators of all the index components in all our indices, based on the sums of counts obtained from all the indices. This feels like re-inventing the wheel, and it's not clear to me yet that the low-level manipulation of indices that we'd need to do is even possible. Any suggestions for distributing indices while ranking results well are very welcome! -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk3n6bsACgkQ5IyIbnMUeTsOFACeM2lsWKXguf8XYUFdDbYtmzc1 Qd8Anjx670zjQ7KYjnxXVQXuR+CBjxCs =Jnkt -----END PGP SIGNATURE----- --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org