Hi,
I try to following the instruction from
http://lucene.apache.org/nutch/tutorial8.html
.....
Intranet: Configuration
To configure things for intranet crawling you must:1. Create a directory with a
flat file of root urls. For example, to
crawl the nutch site you might start with a file named
urls/nutch containing the url of just the Nutch home
page. All other Nutch pages should be reachable from this page. The
urls/nutch file would thus contain:
http://lucene.apache.org/nutch/
....
not understand. Can anyone help me out.
Thanks.
zhou
New Email addresses available on Yahoo!
Get the Email name you've always wanted on the new @ymail and @rocketmail.
Hurry before someone else does!
http://mail.promotions.yahoo.com/newdomains/sg/