Re: Installing Parsers/Tree Builders to, and accessing these packages from Python2.7

Mark Lawrence Sun, 02 Nov 2014 14:34:01 -0800

On 02/11/2014 21:59, Simon Evans wrote:


Oh I don't mind quoting console output, I just thought I'd be sparing you

unnecessary detail.

output was going nicely as I input text from my 'Getting Started with

Beautiful Soup' even when the author reckoned things would go wrong - due to

lxml not being installed, things went right, because I had already installed

it, re:
----------------------------------------------------------------------------
page 17
----------------------------------------------------------------------------
Python 2.7.6 (default, Nov 10 2013, 19:24:18) [MSC v.1500 32 bit (Intel)] on win
32
Type "help", "copyright", "credits" or "license" for more information.

import urllib2
from bs4 import BeautifulSoup
url = "http://www.packtpub.com/books";
page = urllib2.urlopen(url)
soup_packtpage = BeautifulSoup(page)
with open("foo.html","r") as foo_file:

... soup_foo = Soup(foo_file)
   File "<stdin>", line 2
     soup_foo = Soup(foo_file)
            ^
IndentationError: expected an indented block

soup_foo= BeautifulSoup("foo.html")

----------------------------------------------------------------------------
page 18
----------------------------------------------------------------------------

print(soup_foo)

<html><body><p>foo.html</p></body></html>

soup_url = BeautifulSoup("http://www.packtpub.com/books";)
print(soup_url)

<html><body><p>http://www.packtpub.com/books</p></body></html>

helloworld = "<p>Hello World</p>"
soup_string = BeautifulSoup(helloworld)
print(soup_string)

<html><body><p>Hello World</p></body></html>
----------------------------------------------------------------------------
page 19: no code in text on this page
----------------------------------------------------------------------------
page 20
----------------------------------------------------------------------------

soup_xml = BeautifulSoup(helloworld,features= "xml")
soup_xml = BeautifulSoup(helloworld,"xml")
print(soup_xml)

<?xml version="1.0" encoding="utf-8"?>
<p>Hello World</p>

soup_xml = BeautifulSoup(helloworld,features = "xml")
print(soup_xml)

<?xml version="1.0" encoding="utf-8"?>
<p>Hello World</p>

----------------------------------------------------------------------------
Then on bottom of page 20 it says 'we should install the required parsers using 
easy-install,pip or setup.py install' but as I can't get the downloads of html 
or html5 parsers, text code halfway down returns statutory response regarding 
requisite parser needing to be installed, re:
----------------------------------------------------------------------------
page 21
----------------------------------------------------------------------------

invalid_html = '<a invalid content'
soup_invalid_html = BeautifulSoup(invalid_html,'lxml')
print(soup_invalid_html)

<html><body><a content="" invalid=""></a></body></html>

soup_invalid_html = BeautifulSoup(invalid_html,'html5lib')

Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   File "C:\Python27\lib\site-packages\bs4\__init__.py", line 155, in __init__
     % ",".join(features))
ValueError: Couldn't find a tree builder with the features you requested: 
html5lib. Do you need to install a parser library?


Have you tried this from the command prompt?

pip install html5lib

And please do something about the extra newlines and single linedparagraphs above, there's no need for it all.


--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

--
https://mail.python.org/mailman/listinfo/python-list

Re: Installing Parsers/Tree Builders to, and accessing these packages from Python2.7

Reply via email to