Re: Lucene or Nutch ?

Bruno Grilheres Wed, 05 Apr 2006 09:00:10 -0700

Thanks for your answer, I was not aware of the SOLR project,

There was a big typo here, I meant less than 10 Go of PDF files per dayduring one month => i.e. less than 300 Go of PDF files.I made some tests with PDF files, 100Mo or Native PDF are converted to3Mo of index in lucene [The text was indexed but not stored].


Bruno

Yonik Seeley wrote:

On 4/5/06, Bruno Grilheres <[EMAIL PROTECTED]> wrote:

1) High volume of data indexation but only with add and delete
functionality (approximatively 10 PDF) => scalable architecture HDFS
seems good.
2) Specific analysis chain and a given set of meta-data indexation.
3) Language Recognition
4) No graphical interface for searching is needed, no crawling is
needed, Indexation and Search are performed with HTTP Request to a Servlet

What is the best starting choice for this : Lucene or Nutch ?

As far as I know Lucene is a good choice for 2 and 4, Nutch is a better
choice for 1 and 3.


Solr would also be good for 2 and 4
As far as 1, what type of scalability requirements are we talking? (#
documents, size of docs, etc)

-Yonik
http://incubator.apache.org/solr Solr, The Open Source Lucene Search Server

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

___________________________________________________________________________Nouveau : téléphonez moins cher avec Yahoo! Messenger ! Découvez les tarifs exceptionnels pour appeler la France et l'international.

Téléchargez sur http://fr.messenger.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene or Nutch ?

Reply via email to