On Tue, Sep 24, 2013 at 6:19 PM, Dhananjay Nene <dhananjay.n...@gmail.com>wrote:
> On Tue, Sep 24, 2013 at 6:11 PM, Dhananjay Nene > <dhananjay.n...@gmail.com> wrote: > > On Tue, Sep 24, 2013 at 6:04 PM, Dhananjay Nene > > <dhananjay.n...@gmail.com> wrote: > >> On Tue, Sep 24, 2013 at 5:48 PM, Vineet Naik <naik...@gmail.com> wrote: > >>> Hi, > >>> > >>> On Tue, Sep 24, 2013 at 10:38 AM, bab mis <bab...@outlook.com> wrote: > >>> > >>>> Hi ,Any XML parser which gives the same kind of data structure as yaml > >>>> parser gives in python. Tried with xmlmindom but ir's not of a proper > >>>> datastrucure ,every time i need to read by element and create the > dict. > >>>> > >>> > >>> You can try xmltodict[1]. It also retains the node attributes and makes > >>> than accessible using the '@' prefix (See the example in README of the > repo) > >>> > >>> [1]: https://github.com/martinblech/xmltodict > >> > >> Being curious I immediately took a look and tried the following : > >> > >> import xmltodict > >> > >> doc1 = xmltodict.parse(""" > >> <mydocument has="an attribute"> > >> <and> > >> <many>elements</many> > >> <many>more elements</many> > >> </and> > >> <plus a="complex"> > >> element as well > >> </plus> > >> </mydocument> > >> """) > >> > >> doc2 = xmltodict.parse(""" > >> <mydocument has="an attribute"> > >> <and> > >> <many>more elements</many> > >> </and> > >> <plus a="complex"> > >> element as well > >> </plus> > >> </mydocument> > >> """) > >> print(doc1['mydocument']['and']) > >> print(doc2['mydocument']['and']) > >> > >> The output was : > >> OrderedDict([(u'many', [u'elements', u'more elements'])]) > >> OrderedDict([(u'many', u'more elements')]) > >> > >> The only difference is there is only one "many" node inside the "and" > >> node in doc2. Do you see an issue here (at least I do). The output > >> structure is a function of the cardinality of the inner nodes. Since > >> it changes shape from a list of many to not a list of 1 but just 1 > >> element (throwing away the list). Which can make things rather > >> unpredictable. Since you cannot predict upfront whether the existence > >> of just one node inside a parent node is consistent with the xml > >> schema or is just applicable in that particular instance. > >> > >> I do think the problem is tractable so long as one clearly documents > >> the specific constraints which the underlying XML will satisfy, > >> constraints which will allow transformations to lists or dicts safe. > >> Trying to make it easy without clearly documenting the constraints > >> could lead to violations of the principle of least surprise like > >> above. > >> > The README does mention that it's based on this "spec"[1] (or rather a blog post) that has the assumptions. But it seems to be missing a lot of documentation in general as well. Out of curiosity I looked into the code to see if the author has left any comments about this inconsistency (value type varying between lists and unicode/OrderedDict). While there are no such comments, I found that the `parse` function can take a keyword arg `dict_constructor`, so any other dict-like structure can be used instead of OrderedDict. for eg. to force every node to be inside a list irrespective of the cardinality - import xmltodict from collections import defaultdict doc2 = xmltodict.parse(""" <mydocument has="an attribute"> <and> <many>more elements</many> </and> <plus a="complex"> element as well </plus> </mydocument> """, dict_constructor=lambda *a, **k: defaultdict(list)) >>> doc2 defaultdict(<type 'list'>, {u'mydocument': [defaultdict(<type 'list'>, {u'and': [defaultdict(<type 'list'>, {u'many': [u'more elements']})], u'plus': [defaultdict(<type 'list'>, {'#text': [u'element as well'], u'@a': u'complex'})], u'@has': u'an attribute'})]}) >>> doc2['mydocument'][0]['and'][0]['many'] [u'more elements'] Of course, defaultdict would lead to the order of nodes being lost, but an "OrderedDefaultDict" (never tried before :-)) might work. > > It gets even more interesting, eg. below > > > > doc3 = xmltodict.parse(""" > > <mydocument has="an attribute"> > > <and> > > <many>elements</many> > > </and> > > <plus a="complex"> > > element as well > > </plus> > > <and> > > <many>more elements</many> > > </and> > > </mydocument> > > """) > > > > print(doc3['mydocument']['and']) > > > > leads to the output : > > > > [OrderedDict([(u'many', u'elements')]), OrderedDict([(u'many', u'more > > elements')])] > > > > Definitely not what would be naively expected. > > Correction: > > print(doc3['mydocument']) > > prints > > OrderedDict([(u'@has', u'an attribute'), (u'and', > [OrderedDict([(u'many', u'elements')]), OrderedDict([(u'many', u'more > elements')])]), (u'plus', OrderedDict([(u'@a', u'complex'), ('#text', > u'element as well')]))]) > > which just trashed the ordering of an and followed by a plus followed by > an and. > This is a more serious problem particularly if the dict is required to be serialized back to xml. Thanks for pointing out these issues, I had missed them entirely :-) [1]: http://www.xml.com/pub/a/2006/05/31/converting-between-xml-and-json.html - Vineet > Dhananjay > > -- > > ---------------------------------------------------------------------------------------------------------------------------------- > http://blog.dhananjaynene.com twitter: @dnene google plus: > http://gplus.to/dhananjaynene > _______________________________________________ > BangPypers mailing list > BangPypers@python.org > https://mail.python.org/mailman/listinfo/bangpypers > -- Vineet Naik _______________________________________________ BangPypers mailing list BangPypers@python.org https://mail.python.org/mailman/listinfo/bangpypers