On Sun, May 30, 2010 at 9:56 PM, JAGANADH G <jagana...@gmail.com> wrote:
> Dear All I was trying to run Harvestman(A Python tool for web harvesting). > I got the following error > http://pastebin.com/uPzUs0Xw > > My configuration file is http://pastebin.com/dfhiy2Q6 > > Can any body help me regarding this. > > I was trying to harvest my blog with a word filter 'Python' > There is no word filter anymore. You hit upon a bug which seems to still apply the word-filter code :) For filtering based on words or regular expressions on the page content, you can implement a custom crawler. It is pretty easy and a sample already exists. Just modify the code to suit the keyword(s) you want to filter. Look for "searchingcrawler.py" inside apps/samples folder and modify the code. > > -- > ********************************** > JAGANADH G > http://jaganadhg.freeflux.net/blog > _______________________________________________ > BangPypers mailing list > BangPypers@python.org > http://mail.python.org/mailman/listinfo/bangpypers > -- --Anand _______________________________________________ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers