On 02/11/2014 21:59, Simon Evans wrote:
Oh I don't mind quoting console output, I just thought I'd be sparing you
unnecessary detail.
output was going nicely as I input text from my 'Getting Started with
Beautiful Soup' even when the author reckoned things would go wrong - due to
lxml not being installed, things went right, because I had already installed
it, re:
----------------------------------------------------------------------------
page 17
----------------------------------------------------------------------------
Python 2.7.6 (default, Nov 10 2013, 19:24:18) [MSC v.1500 32 bit (Intel)] on win
32
Type "help", "copyright", "credits" or "license" for more information.
import urllib2
from bs4 import BeautifulSoup
url = "http://www.packtpub.com/books"
page = urllib2.urlopen(url)
soup_packtpage = BeautifulSoup(page)
with open("foo.html","r") as foo_file:
... soup_foo = Soup(foo_file)
File "<stdin>", line 2
soup_foo = Soup(foo_file)
^
IndentationError: expected an indented block
soup_foo= BeautifulSoup("foo.html")
----------------------------------------------------------------------------
page 18
----------------------------------------------------------------------------
print(soup_foo)
<html><body><p>foo.html</p></body></html>
soup_url = BeautifulSoup("http://www.packtpub.com/books")
print(soup_url)
<html><body><p>http://www.packtpub.com/books</p></body></html>
helloworld = "<p>Hello World</p>"
soup_string = BeautifulSoup(helloworld)
print(soup_string)
<html><body><p>Hello World</p></body></html>
----------------------------------------------------------------------------
page 19: no code in text on this page
----------------------------------------------------------------------------
page 20
----------------------------------------------------------------------------
soup_xml = BeautifulSoup(helloworld,features= "xml")
soup_xml = BeautifulSoup(helloworld,"xml")
print(soup_xml)
<?xml version="1.0" encoding="utf-8"?>
<p>Hello World</p>
soup_xml = BeautifulSoup(helloworld,features = "xml")
print(soup_xml)
<?xml version="1.0" encoding="utf-8"?>
<p>Hello World</p>
----------------------------------------------------------------------------
Then on bottom of page 20 it says 'we should install the required parsers using
easy-install,pip or setup.py install' but as I can't get the downloads of html
or html5 parsers, text code halfway down returns statutory response regarding
requisite parser needing to be installed, re:
----------------------------------------------------------------------------
page 21
----------------------------------------------------------------------------
invalid_html = '<a invalid content'
soup_invalid_html = BeautifulSoup(invalid_html,'lxml')
print(soup_invalid_html)
<html><body><a content="" invalid=""></a></body></html>
soup_invalid_html = BeautifulSoup(invalid_html,'html5lib')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\bs4\__init__.py", line 155, in __init__
% ",".join(features))
ValueError: Couldn't find a tree builder with the features you requested:
html5lib. Do you need to install a parser library?
Have you tried this from the command prompt?
pip install html5lib
And please do something about the extra newlines and single lined
paragraphs above, there's no need for it all.
--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.
Mark Lawrence
--
https://mail.python.org/mailman/listinfo/python-list