Tim schrieb am 20.11.2014 um 18:31: > On Thursday, November 20, 2014 12:04:09 PM UTC-5, Denis McMahon wrote: >>> On Wednesday, November 19, 2014 2:08:27 PM UTC-7, Denis McMahon wrote: >>>> So what I'm looking for is a method to create an html5 document using >>>> "dom manipulation", ie: >>>> >>>> doc = new htmldocument(doctype="HTML") >>>> html = new html5element("html") >>>> doc.appendChild(html) >>>> head = new html5element("body") >>>> html.appendChild(head) >>>> body = new html5element("body") >>>> html.appendChild(body) >>>> title = new html5element("title") >>>> txt = new textnode("This Is The Title") >>>> title.appendChild(txt) >>>> head.appendChild(title) >>>> para = new html5element("p") >>>> txt = new textnode("This is some text.") >>>> para.appendChild(txt) >>>> body.appendChild(para) >>>> >>>> print(doc.serialise()) >>>> >>>> generates: >>>> >>>> <!doctype HTML><html><head><title>This Is The Title</title></ >>>> head><body><p>This is some text.</p></body></html> >>>> >>>> I'm finding various mechanisms to generate the structure from an >>>> existing piece of html (eg html5lib, beautifulsoup etc) but I can't >>>> seem to find any mechanism to generate, manipulate and produce html5 >>>> documents using this dom manipulation approach. Where should I be >>>> looking? >> >> Everything there seems to assume I'll be creating a document serially, eg >> that I won't get to some point in the document and decide that I want to >> add an element earlier. >> >> bs4 and html5lib will parse a document into a tree structure, but they're >> not so hot on manipulating the tree structure, eg adding and moving nodes. >> >> Actually it looks like bs4 is going to be my best bet, although limited >> it does have most of what I'm looking for. I just need to start by giving >> it "<html></html>" to parse. > > I believe lxml should work for this. Here's a snippet that I have used to > create an HTML document: > > from lxml import etree > page = etree.Element('html') > doc = etree.ElementTree(page) > > head = etree.SubElement(page, 'head') > body = etree.SubElement(page, 'body') > table = etree.SubElement(body, 'table') > > etc etc > > with open('mynewfile.html', 'wb') as f: > doc.write(f, pretty_print=True, method='html') > > (you can leave out the method= option to get xhtml).
There's also the E-factory for creating (sub-)trees and a nicely objectish way: http://lxml.de/lxmlhtml.html#creating-html-with-the-e-factory and the just released lxml 3.4.1 has an "htmlfile" context manager that allows you to generate HTML incrementally: http://lxml.de/api.html#incremental-xml-generation Obviously, you can combine both, so you can create a subtree in memory and write it into an incrementally built HTML stream. Pretty versatile. Stefan -- https://mail.python.org/mailman/listinfo/python-list