Ok, I can fix it by modifying if self.inOptions and self.curTag != "options":
to if self.inOptions and self.curTag != "options" and self.curTag != "" but this feels really freaking ugly. Sigh. Any suggestions? I know I must be missing something. Also, I hate the tendency I have to figure stuff out shortly after posting to a mailing list or forum. Happens all the time, and I swear I don't solve stuff until I ask for help. On 5/23/07, kaens <[EMAIL PROTECTED]> wrote: > Wait. . . it's because the curTag is set to "", thus it sets the > whitespace after a tag to that part of the dict. > > That doesn't explain why it does it on a xml file containing no > whitespace, unless it's counting newlines. > > Is there a way to just ignore whitespace and/or xml comments? > > On 5/23/07, kaens <[EMAIL PROTECTED]> wrote: > > Hey everyone, this may be a stupid question, but I noticed the > > following and as I'm pretty new to using xml and python, I was > > wondering if I could get an explanation. > > > > Let's say I write a simple xml parser, for an xml file that just loads > > the content of each tag into a dict (the xml file doesn't have > > multiple hierarchies in it, it's flat other than the parent node) > > > > so we have > > <parent> > > <option1>foo</option1> > > <option2>bar</option2> > > . . . > > </parent> > > > > (I'm using xml.parsers.expat) > > the parser sets a flag that says it's in the parent, and sets the > > value of the current tag it's processing in the start tag handler. > > The character data handler sets a dictionary value like so: > > > > dictName[curTag] = data > > > > after I'm done processing the file, I print out the dict, and the first > > value is > > <a few bits of whitespace> : <a whole bunch of whitespace> > > > > There are comments in the xml file - is this what is causing this? > > There are also blank lines. . .but I don't see how a blank line would > > be interpreted as a tag. Comments though, I could see that happening. > > > > Actually, I just did a test on an xml file that had no comments or > > whitespace and got the same behaviour. > > > > If I feed it the following xml file: > > > > <options> > > <one>hey</one> > > <two>bee</two> > > <three>eff</three> > > </options> > > > > it prints out: > > " : > > > > three : eff > > two : bee > > one : hey" > > > > wtf. > > > > For reference, here's the handler functions: > > > > def handleCharacterData(self, data): > > if self.inOptions and self.curTag != "options": > > self.options[self.curTag] = data > > > > def handleStartElement(self, name, attributes): > > if name == "options": > > self.inOptions = True > > if self.inOptions: > > self.curTag = name > > > > > > def handleEndElement(self, name): > > if name == "options": > > self.inOptions = False > > self.curTag = "" > > > > Sorry if the whitespace in the code got mangled (fingers crossed...) > > > -- http://mail.python.org/mailman/listinfo/python-list