Wait. . . it's because the curTag is set to "", thus it sets the whitespace after a tag to that part of the dict.
That doesn't explain why it does it on a xml file containing no whitespace, unless it's counting newlines. Is there a way to just ignore whitespace and/or xml comments? On 5/23/07, kaens <[EMAIL PROTECTED]> wrote: > Hey everyone, this may be a stupid question, but I noticed the > following and as I'm pretty new to using xml and python, I was > wondering if I could get an explanation. > > Let's say I write a simple xml parser, for an xml file that just loads > the content of each tag into a dict (the xml file doesn't have > multiple hierarchies in it, it's flat other than the parent node) > > so we have > <parent> > <option1>foo</option1> > <option2>bar</option2> > . . . > </parent> > > (I'm using xml.parsers.expat) > the parser sets a flag that says it's in the parent, and sets the > value of the current tag it's processing in the start tag handler. > The character data handler sets a dictionary value like so: > > dictName[curTag] = data > > after I'm done processing the file, I print out the dict, and the first value > is > <a few bits of whitespace> : <a whole bunch of whitespace> > > There are comments in the xml file - is this what is causing this? > There are also blank lines. . .but I don't see how a blank line would > be interpreted as a tag. Comments though, I could see that happening. > > Actually, I just did a test on an xml file that had no comments or > whitespace and got the same behaviour. > > If I feed it the following xml file: > > <options> > <one>hey</one> > <two>bee</two> > <three>eff</three> > </options> > > it prints out: > " : > > three : eff > two : bee > one : hey" > > wtf. > > For reference, here's the handler functions: > > def handleCharacterData(self, data): > if self.inOptions and self.curTag != "options": > self.options[self.curTag] = data > > def handleStartElement(self, name, attributes): > if name == "options": > self.inOptions = True > if self.inOptions: > self.curTag = name > > > def handleEndElement(self, name): > if name == "options": > self.inOptions = False > self.curTag = "" > > Sorry if the whitespace in the code got mangled (fingers crossed...) > -- http://mail.python.org/mailman/listinfo/python-list