Oh I don't mind quoting console output, I just thought I'd be sparing you
unnecessary detail. output was going nicely as I input text from my 'Getting Started with Beautiful Soup' even when the author reckoned things would go wrong - due to lxml not being installed, things went right, because I had already installed it, re: ---------------------------------------------------------------------------- page 17 ---------------------------------------------------------------------------- Python 2.7.6 (default, Nov 10 2013, 19:24:18) [MSC v.1500 32 bit (Intel)] on win 32 Type "help", "copyright", "credits" or "license" for more information. >>> import urllib2 >>> from bs4 import BeautifulSoup >>> url = "http://www.packtpub.com/books" >>> page = urllib2.urlopen(url) >>> soup_packtpage = BeautifulSoup(page) >>> with open("foo.html","r") as foo_file: ... soup_foo = Soup(foo_file) File "<stdin>", line 2 soup_foo = Soup(foo_file) ^ IndentationError: expected an indented block >>> soup_foo= BeautifulSoup("foo.html") ---------------------------------------------------------------------------- page 18 ---------------------------------------------------------------------------- >>> print(soup_foo) <html><body><p>foo.html</p></body></html> >>> soup_url = BeautifulSoup("http://www.packtpub.com/books") >>> print(soup_url) <html><body><p>http://www.packtpub.com/books</p></body></html> >>> helloworld = "<p>Hello World</p>" >>> soup_string = BeautifulSoup(helloworld) >>> print(soup_string) <html><body><p>Hello World</p></body></html> ---------------------------------------------------------------------------- page 19: no code in text on this page ---------------------------------------------------------------------------- page 20 ---------------------------------------------------------------------------- >>> soup_xml = BeautifulSoup(helloworld,features= "xml") >>> soup_xml = BeautifulSoup(helloworld,"xml") >>> print(soup_xml) <?xml version="1.0" encoding="utf-8"?> <p>Hello World</p> >>> soup_xml = BeautifulSoup(helloworld,features = "xml") >>> print(soup_xml) <?xml version="1.0" encoding="utf-8"?> <p>Hello World</p> >>> ---------------------------------------------------------------------------- Then on bottom of page 20 it says 'we should install the required parsers using easy-install,pip or setup.py install' but as I can't get the downloads of html or html5 parsers, text code halfway down returns statutory response regarding requisite parser needing to be installed, re: ---------------------------------------------------------------------------- page 21 ---------------------------------------------------------------------------- >>> invalid_html = '<a invalid content' >>> soup_invalid_html = BeautifulSoup(invalid_html,'lxml') >>> print(soup_invalid_html) <html><body><a content="" invalid=""></a></body></html> >>> soup_invalid_html = BeautifulSoup(invalid_html,'html5lib') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\Python27\lib\site-packages\bs4\__init__.py", line 155, in __init__ % ",".join(features)) ValueError: Couldn't find a tree builder with the features you requested: html5lib. Do you need to install a parser library? >>> -- https://mail.python.org/mailman/listinfo/python-list