How to pick content from html using beatifulsoup

Sheetal Singh Mon, 09 Jul 2012 21:07:17 -0700

Hi,

I am a newbie in python, I need to fetch names of side filters and save in csv 
[PFA screen shot].


Following is snippet from code:
  soup = BeautifulStoneSoup(html)
#                for e in soup.findAll('div'):
#                     for c in e.findAll('h3'):
#                        for d in c.findAll('li'):
#                            print'@@@@@@@', d.extract()
#

#                #select_pod=soup.findAll('div', {"class":"win aboutUs"})
#                #promeg= select_pod[0].findAll("p")[0]
#
#



#                for dv in soup.findAll('div', {"class":"attribution"}):
#                            ds = dv.findAll("<h3>")
#                            print ds



                select_pod = soup.findAll('div')
                print select_pod
                for j in select_pod:
                        if j is not None:
                            print j.findall('a')
                promeg = select_pod.findAll("<h3>")
                #print '--', promeg




                #hreflist = [ each.get('value') for each in 
soup.findAll('<h3>') ]


                for m in promeg :
                                if m:
                                        print 'Data values', m
                                        fd1.writerow([x[2], m, i[0], "Data 
Found"])


Structure of HTML:

<div class="attribution">
<div>
<h3>By Brand</h3>
<ul>
<li>
<a href="http://www.xyz.com/cellphones/nokia/nokia/259-33902/buy";>Nokia</a>
</li>
<li>
<li>
<li>
<li>
<li>
<li>
<li>
<li class="more">
</ul>
</div>
<div>
<h3>By Seller</h3>
<ul>
<li>
<a id="att_296935_184059" class="attributeUrlReplacementTarget" 
href="http://www.xyz.com/cellphones/nokia/amazon-marketplace/296935-184059/buy";>Amazon
 Marketplace</a>
<input id="att_296935_184059_replacement" type="hidden" 
value="http://www.xyz.com/cellphones/nokia/amazon-marketplace/296935-184059/buy";>
</li>
<li>
<li>
<li>
<li>
<li>
<li>
<li>
<li class="more">
</ul>
</div>
<div>
<div>
</div>


Output required in csv:

By Brands
Nokia
Samsung
.
.

By Seller
Amazon
Buy.com
.
.
.



Please suggest how to fetch details.

Sheetal Singh

<<attachment: filters.png>>

-- 
http://mail.python.org/mailman/listinfo/python-list

How to pick content from html using beatifulsoup

Reply via email to