Kee Nethery wrote: > On Jun 25, 2009, at 11:39 PM, Stefan Behnel wrote: >> parsing a >> document from a string does not have its own function, because it is >> trivial to write >> >> tree = parse(BytesIO(some_byte_string)) > > :-) Trivial for someone familiar with the language. For a newbie like > me, that step was non-obvious.
I actually meant the code complexity, not the fact that you need to know BytesIO to do the above. >> If what you meant is actually parsing from a byte string, this is easily >> done using BytesIO(), or StringIO() in Py2.x (x<6). > > Yes, thanks! Looks like BytesIO is a v.3.x enhancement. It should be available in 2.6 AFAIR, simply as an alias for StringIO. > Looks like the > StringIO does what I need since all I'm doing is pulling the unicode > string into et.parse. As I said, this won't work, unless you are either a) passing a unicode string with plain ASCII characters in Py2.x or b) confusing UTF-8 and Unicode >>> theXmlDataTree = >> et.parse(makeThisUnicodeStringLookLikeAFileSoParseWillDealWithIt(theXmlData)) >> >> This will not work because ET cannot parse from unicode strings (unless >> they only contain plain ASCII characters and you happen to be using >> Python >> 2.x). lxml can parse from unicode strings, but it requires that the XML >> must not have an encoding declaration (which would render it non >> well-formed). This is convenient for parsing HTML, it's less >> convenient for XML usually. > > Right for my example, if the data is coming in as UTF-8 I believe I can do: > theXmlDataTree = et.parse(StringIO.StringIO(theXmlData), encoding > ='utf-8') Yes, although in this case you are not parsing a unicode string but a UTF-8 encoded byte string. Plus, passing 'UTF-8' as encoding to the parser is redundant, as it is the default for XML. Stefan -- http://mail.python.org/mailman/listinfo/python-list