Write my own crawler VS use nutch?

spamsucks Fri, 26 Jan 2007 13:03:36 -0800

I am successfully using lucene in our application to index 12 differenttypes of objects located in a database, and their relationships to eachother to provide some nice search functionality for our website. We arebuilding lots of lucene queries programmatically to filter based uponcategories, regions, zip codes, scoring, long/lats...

My problem is that there is content that is not in the database which wehave a lot of... (about 3000+ pages) that we need to also include in thesearch results. It's a whole lot of jsp's.


As I see this, I can either
a) Migrate this application to nutch

b) Write a web crawler to crawl our site and inject the crawl results intoour lucene index.

I am leaning towards option B (write our own crawler), since I think itwould only take me a couple of days of write a simple crawler and I wouldn'thave to change much else.

Can anyone think of any points/counterpoints for using Nutch vs. writing acrawler to extend our already used lucene framework?


Thanks.




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Write my own crawler VS use nutch?

Reply via email to