-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Lawrence Oluyede wrote: > If the content producer doesn't provide the full article via RSS/ATOM > there's no way you can get it from there. Search for full content feeds > if any, otherwise get the article URL and feed it to BeautifulSoup to > scrape the content. >
For the same feed (where the content producer doesn't provide the full article!) I was able to see the complete post in other RSS aggregators (like Blam). I wanted to know how they were able to collect the feed! I knew for sure that you can't do screen scraping separately for each and every blog and that there has be a standard way or atleast that blogs maintain a standard template for rendering posts. I mean if each of the site only offered partial content and the rest had to be scraped from the page, and the page maintained a non-standard structure which is more likely, then it would become impossible IMHO for any aggregator to aggregate feeds! I shall for now try with BeautifulSoup, though I'm still doubtful about it. - -- _ _ _]{5pitph!r3}[_ _ _ __________________________________________________ “I'm smart enough to know that I'm dumb.” - Richard P Feynman -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGwC1SA0th8WKBUJMRAs4eAJ0bLJVzEZls1JtE6e8MUrqdapXGPwCfVO02 yYzezvhJFY1SDHUGxrJdR5M= =rfLo -----END PGP SIGNATURE----- -- http://mail.python.org/mailman/listinfo/python-list