Hi, we are in the process of planning a search feature of a product and we are having quite a hard time figuring out the "right" way to do it.
The requirements for our app are the following: 1) Large number of indices (at _least_ 10000) 2) The amount of data involved per index is not very high, but because of the number of indices involved the data set will be something about 500 - 1000 GB 3) The searching capabilities must be fail safe, while it's acceptable if deletes/updates can take some time. 4) The majority of operations will be searching the indices. I've followed the mailing list intensively the last month and especially the "Best Practices for Distributing Lucene Indexing and Searching" (http://marc.theaimsgroup.com/?l=lucene-user&m=110971318020691&w=2) and "Real time indexing and distribution to lucene on separate boxes (long)" (http://marc.theaimsgroup.com/?l=lucene-user&m=107900097217474&w=2) threads provided some interesting insight. But still our requirements are a bit different. My thoughts how the above could be handled, so far are: 1) Have one *really big* "master" which handles all tasks related to index manipulation. Sync the indices according to Doug's tips http://marc.theaimsgroup.com/?l=lucene-user&m=110973989200204&w=2 out to a cluster of slaves that are responsible for searching. Problem: How to make sure that indices across slaves are in sync. Big Problem: Syncing of this large number of indices will cause a lot of traffic and cause already quite a load on the slaves (not to speak of the master) 1.1) Is it safe to _search_ (only) an index mounted via NFS? If yes, then the search boxes could mount the indices on the master box. But this solution would probably lead to some serious perfomance issues because of the needed disk I/O on the master. Though I'd love to be proven wrong on this one. 2) Split up index collection into smaller portions and distribute a certain number of indices (~ up to 1000 indices) into smaller autonomous clusters, that are completely responsible for their collection of indices. Problem: How do I keep index distribution dynamic so I don't have to hardcode where to look for a certain index (that's not a real lucene issue, but more one of distributed computing, but nevertheless I thought you guys might know a way to solve it). Any ideas on this? Has anyone ever worked with such a large number of Lucene indices (and the amount of data it involves)? I appreciate your help very much. Cheers Benjamin --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]