just crawling is supereasy. its how to index and search that is hard.
just start at yahoo.com, scrape out all the links and then for every
site visit every link.
i wrote a crawler in 15 lines of code. but then it all it did was
visit the sites, not indexing them or anything.

you could write a faster one in C++ probably but if you are new to it
doing it in python will let you experiment and learn faster.

some links:
http://infolab.stanford.edu/~backrub/google.html
http://www-csli.stanford.edu/~hinrich/information-retrieval-book.html



http://www.example-code.com/python/pythonspider.asp
http://www.example-code.com/python/spider_simpleCrawler.asp
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to