On Apr 1, 6:28 pm, 7stud <[EMAIL PROTECTED]> wrote: > On Apr 1, 5:25 pm, 7stud <[EMAIL PROTECTED]> wrote: > > > > > > > You can treat a tag like a dictionary to obtain a specific attribute: > > > import BeautifulSoup as bs > > > html = "<div x='a' y='b' z='c'>hello</div>" > > > doc = bs.BeautifulSoup(html) > > div = doc.find("div") > > print div > > print div["x"] > > > --output:-- > > a > > > But you can't iterate over a tag to get all the attributes: > > > import BeautifulSoup as bs > > > html = "<div x='a' y='b' z='c'>hello</div>" > > > doc = bs.BeautifulSoup(html) > > div = doc.find("div") > > > for key in div: > > print key, div[key] > > > --output:-- > > hello > > Traceback (most recent call last): > > File "test1.py", line 9, in ? > > print key, div[key] > > File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ > > python2.4/site-packages/BeautifulSoup.py", line 430, in __getitem__ > > return self._getAttrMap()[key] > > KeyError: u'hello' > > > How can you get all the attributes when you don't know the attribute > > names ahead of time? > > I figured it out: > > import BeautifulSoup as bs > > html = "<div x='a' y='b' z='c'>hello</div>" > > doc = bs.BeautifulSoup(html) > div = doc.find("div") > > for attr, val in div.attrs: > print "%s:%s" % (attr, val) > > --output:-- > x:a > y:b > z:c- Hide quoted text - >
Just for another datapoint, here's how it looks with pyparsing. -- Paul from pyparsing import makeHTMLTags,SkipTo html = """<div x="a" y="b" z="c">hello</div>""" # HTML tags match case-insensitive'ly divStart,divEnd = makeHTMLTags("DIV") divTag = divStart + SkipTo(divEnd)("body") + divEnd for div in divTag.searchString(html): print div.dump() print # dict-like access to results for k in div.keys(): print k,div[k] # object.attribute access to results print div.body print div.x print div.y print Prints: ['DIV', ['x', 'a'], ['y', 'b'], ['z', 'c'], False, 'hello', '</DIV>'] - body: hello - empty: False - endDiv: </DIV> - startDiv: ['DIV', ['x', 'a'], ['y', 'b'], ['z', 'c'], False] - empty: False - x: a - y: b - z: c - x: a - y: b - z: c body hello endDiv </DIV> y b x a z c startDiv ['DIV', ['x', 'a'], ['y', 'b'], ['z', 'c'], False] empty False hello a b -- http://mail.python.org/mailman/listinfo/python-list