Re: Questions about XML processing?

Hernán De Angelis Fri, 06 Nov 2020 12:57:47 -0800

Thank you Terry, Dan and Dieter for encouraging me to post here. I havealready solved the problem albeit with a not so efficient solution.Perhaps, it is useful to present it here anyway in case some light canbe added to this.

My job is to parse a complicated XML (iso metadata) and pick up valuesof certain fields in certain conditions. This goes for the most partwell. I am working with xml.etree.elementtree, which proved sufficientfor the most part and the rest of the project. JSON is not an optionwithin this project.

The specific trouble was in this section, itself the child of a morecomplicated parent: (for simplicity tags are renamed and namespaces removed)


          <tagA>
            <tagB>
              <tagC>
                <string>Something</string>
              </tagC>
              <tagC>
                <string>Something else</string>
              </tagC>
              <tagC>
                <note>
                  <title>
                    <string>value</string>
                  </title>
                  <date0>
                    <date1>
                      <date2>
<gco:Date>2020-11-06</gco:Date>
                      </date2>
                      <dateType>
                        <code blah lots of strange things blah />
                      </dateType>
                    </date1>
                  </date0>
                </note>
              </tagC>
            </tagB>
          </tagA>

Basically, I have to get what is in tagC/string but only if the value oftagC/note/title/string is "value". As you see, there are several tagC,all children of tagB, but tagC can have different meanings(!). And no, Ihave no control over how these XML fields are constructed.


In principle it is easy to make a "findall" and get strings for tagC, using:

elem.findall("./tagA/tagB/tagC/string")

and then get the content and append in case there is more than onetagC/string like: "Something, Something else".

However, the hard thing to do here is to get those only whentagC/note/title/string='value'. I was expecting to find a way ofspecifying a certain construction in square brackets, like[@string='value'] or [@/tagC/note/title/string='value'], as is usual inXML and possible in xml.etree. However this proved difficult (at leastfor me). So this is the "brute" solution I implemented:


- find all children of tagA/tagB
- check if /tagA/tagB/tagC/note/title/string has "value"
- if yes find all tagA/tagB/tagC/string

In quasi-Python:

string = []
element0 = elem.findall("./tagA/tagB/")
    for element1 in element0:
        element2 = element1.find("./tagA/tagB/tagC/note/title/string")
            if element2.text == 'value'
                element3 = element1.findall("./tagA/tagB/tagC/string)
                for element4 in element3:
                    string.append(element4.text)

Crude, but works. As I wrote above, I was wishing that a bracketedclause of the type [@ ...] already in the first "findall" would do amore efficient job but alas my knowledge of xml is too rudimentary.Perhaps something to tinker on in the coming weeks.


Have a nice weekend!





On 2020-11-06 20:10, Terry Reedy wrote:

On 11/6/2020 11:17 AM, Hernán De Angelis wrote:
I am confronting some XML parsing challenges and would like to asksome questions to more knowledgeable Python users. Apparently thereexists a group for such questions but that list (xml-sig) hasapparently not received (or archived) posts since May 2018(!). Iwonder if there are other list or forum for Python XML questions, orif this list would be fine for that.
If you don't hear otherwise, try here. Or try stackoverflow.com andtag questions with python and xml.

--
https://mail.python.org/mailman/listinfo/python-list

Re: Questions about XML processing?

Reply via email to