Robert wrote:
Stefan Behnel wrote:
Robert, 31.01.2010 20:57:
I tried lxml, but after walking and making changes in the element tree,
I'm forced to do a full serialization of the whole document
(etree.tostring(tree)) - which destroys the "human edited" format of the
original HTML code. makes it rather unreadable.
What do you mean? Could you give an example? lxml certainly does not
destroy anything it parsed, unless you tell it to do so.
of course it does not destroy during parsing.(?)
I mean: I want to walk with a Python script through the parsed tree HTML
and modify here and there things (auto alt tags from DB/similar, link
corrections, text sections/translated sentences... due to HTML code and
content checks.)
Then I want to output the changed tree - but as close to the original
format as far as possible. No changes to my white space identation,
etc.. Only lokal changes, where really tags where changed.
Thats similiar like that what a good HTML editor does: After you made
little changes, it doesn't reformat/re-spit-out your whole code layout
from tree/attribute logic only. you have lokal changes only.
But a simple HTML editor like that in Mozilla-Seamonkey outputs a whole
new HTML, produces the HTML from logical tree only (regarding his (ugly)
style), destroys my whitspace layout and much more - forgetting
anything about the original layout.
Such a "good HTML editor" must somehow track the original positions of
the tags in the file. And during each logical change in the tree it must
tracks the file position changes/offsets. That thing seems to miss in
lxml and BeautifulSoup which I tried so far.
This is a frequent need I have. Nobody else's?
Seems I need to write my own or patch BS to do that extra tracking?
basic feature(s) of such parser perhaps:
* can it tell for each tag object in the parsed tree, at what
original file position start:end it resided? even a basic need:
tell me the line number e.g. (for warning/analysis reports e.g.)
(* do the tree objects auto track/know if they were changed. (for
convenience; a tree copy may serve this otherwise .. )
the creation of a output with local changes whould be rather
simple from that ...
Robert
--
http://mail.python.org/mailman/listinfo/python-list