Hi,

I've following constellation (planned architecture):

[Webserver - APACHE]
which serves the content

[unspecified other servers]

[CMS Server / SearchEngine - TOMCAT]
handles the content creation and publishing to the webserver
indexing of content stored at the apache-machine


The tomcat-machine should index the APACHE and maybe some other servers by
cronjob. search requests from the webserver are forwarded to the search
engine at the tomcat-machine.

Indexing HTML-files have priority - PDF, Word and stuff like that would be
very nice.


1.) Which search engine (means lucene implementation) would be the best
choice for such a situation? In other words: what's the difference?
          - Lucene
          - Nutch

2.) Are there other search engines which are better for solving this issue?
3.) Do I have to write my own indexer (which is parsing html, pdf...) or are
there usefull templates/indexers available?
4.) Does anybody know a free alternative (for commercial use) to Zilverline
(http://www.zilverline.org/)?

TIA,
david

Reply via email to