Simon Evans wrote: > Dear Mark Lawrence, thank you for your advice. > I take it that I use the input you suggest for the line : > > soup = BeautifulSoup("C:\Beautiful Soup\ecological_pyramid.html",lxml") > > seeing as I have to give the file's full address I therefore have to > modify your : > > soup = BeautifulSoup(ecological_pyramid,"lxml") > > to : > > soup = BeautifulSoup("C:\Beautiful Soup\ecological_pyramid," "lxml") > > otherwise I get : > > >>>> with open("C:\Beautiful Soup\ecologicalpyramid.html"."r")as >>>> ecological_pyramid: soup = BeautifulSoup(ecological_pyramid,"lxml") > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > NameError: name 'ecological_pyramid' is not defined > > > so anyway with the input therefore as: > >>>> with open("C:\Beautiful Soup\ecologicalpyramid.html"."r")as >>>> ecological_pyramid: soup = BeautifulSoup("C:\Beautiful >>>> Soup\ecological_pyramid,","lxml") producer_entries = soup.find("ul") >>>> print(producer_entries.li.div.string)
No. If you pass the filename beautiful soup will mistake it as the HTML. You can verify that in the interactive interpreter: >>> soup = BeautifulSoup("C:\Beautiful Soup\ecologicalpyramid.html","lxml") >>> soup <html><body><p>C:\Beautiful Soup\ecologicalpyramid.html</p></body></html> You have to pass an open file to BeautifulSoup, not a filename: >>> with open("C:\Beautiful Soup\ecologicalpyramid.html","r") as f: ... soup = BeautifulSoup(f, "lxml") ... However, if you look at the data returned by soup.find("ul") you'll see >>> producer_entries = soup.find("ul") >>> producer_entries <ul id="producers"> <li class="producers"> </li><li class="producerlist"> <div class="name">plants</div> <div class="number">100000</div> </li> <li class="producerlist"> <div class="name">algae</div> <div class="number">100000</div> </li> </ul> The first <li>...</li> node does not contain a div >>> producer_entries.li <li class="producers"> </li> and thus >>> producer_entries.li.div is None True and the following error is expected with the given data. Returning None is beautiful soup's way of indicating that the <li> node has no <div> child at all. If you want to process the first li that does have a <div> child a straight-forward way is to iterate over the children: >>> for li in producer_entries.find_all("li"): ... if li.div is not None: ... print(li.div.string) ... break # remove if you want all, not just the first ... plants Taking a second look at the data you probably want the li nodes with class="producerlist": >>> for li in soup.find_all("li", attrs={"class": "producerlist"}): ... print(li.div.string) ... plants algae -- https://mail.python.org/mailman/listinfo/python-list