Hmm, ElementTree.tostring() also adds a space between the last character of the element name and the />. Not sure why it is doing this.
Something like <root/> will become <root /> after the tostring(). On 9/25/07, Robert Dailey <[EMAIL PROTECTED]> wrote: > > One thing I noticed is that it is placing an arbitrary space between " and > />. For example: > > > <root><frame type="image" /></root> > > Notice that there's a space between "image" and /></root> > > Any way to fix this? Thanks. > > On 9/24/07, Gabriel Genellina <[EMAIL PROTECTED]> wrote: > > > > En Mon, 24 Sep 2007 23:51:57 -0300, Robert Dailey <[EMAIL PROTECTED]> > > escribi�: > > > > > What I meant was that it's not an option because I'm trying to learn > > > regular > > > expressions. RE is just as built in as anything else. > > > > Ok, let's analyze what you want. You have for instance this text: > > "<action></action>" > > which should become > > "<action/>" > > > > You have to match: > > (opening angle bracket)(any word)(closing angle bracket)(opening angle > > bracket)(slash)(same word as before)(closing angle bracket) > > > > This translates rather directly into this regular expression: > > > > r"<(\w+)></\1>" > > > > where \w+ means "one or more alphanumeric characters or _", and being > > surrounded in () creates a group (group number one), which is > > back-referenced as \1 to express "same word as before" > > The matched text should be replaced by (opening <)(the word > > found)(slash)(closing >), that is: r"<\1/>" > > Using the sub function in module re: > > > > py> import re > > py> source = """ > > ... <root></root> > > ... <root/> > > ... <root><frame type="image"><action></action></frame></root> > > ... <root><frame type="image"><action/></frame></root> > > ... """ > > py> print re.sub(r"<(\w+)></\1>", r"<\1/>", source) > > > > <root/> > > <root/> > > <root><frame type="image"><action/></frame></root> > > <root><frame type="image"><action/></frame></root> > > > > Now, a more complex example, involving tags with attributes: > > <frame type="image"></frame> --> <frame type="image" /> > > > > You have to match: > > (opening angle bracket)(any word)(any sequence of words,spaces,other > > symbols,but NOT a closing angle bracket)(closing angle bracket)(opening > > angle bracket)(slash)(same word as before)(closing angle bracket) > > > > r"<(\w+)([^>]*)></\1>" > > > > [^>] means "anything but a >", the * means "may occur many times, maybe > > zero", and it's enclosed in () to create group 2. > > > > py> source = """ > > ... <root></root> > > ... <root><frame type="image"></frame></root> > > ... """ > > py> print re.sub(r"<(\w+)([^>]*)></\1>", r"<\1\2 />", source) > > > > <root /> > > <root><frame type="image" /></root> > > > > Next step would be to allow whitespace wherever it is legal to appear - > > left as an exercise to the reader. Hint: use \s* > > > > -- > > Gabriel Genellina > > > > -- > > http://mail.python.org/mailman/listinfo/python-list > > >
-- http://mail.python.org/mailman/listinfo/python-list