Hi, I've following constellation (planned architecture):
[Webserver - APACHE] which serves the content [unspecified other servers] [CMS Server / SearchEngine - TOMCAT] handles the content creation and publishing to the webserver indexing of content stored at the apache-machine The tomcat-machine should index the APACHE and maybe some other servers by cronjob. search requests from the webserver are forwarded to the search engine at the tomcat-machine. Indexing HTML-files have priority - PDF, Word and stuff like that would be very nice. 1.) Which search engine (means lucene implementation) would be the best choice for such a situation? In other words: what's the difference? - Lucene - Nutch 2.) Are there other search engines which are better for solving this issue? 3.) Do I have to write my own indexer (which is parsing html, pdf...) or are there usefull templates/indexers available? 4.) Does anybody know a free alternative (for commercial use) to Zilverline (http://www.zilverline.org/)? TIA, david