> On 28 Jul 2022, at 00:01, Fitchett, Deborah <deborah.fitch...@lincoln.ac.nz> 
> wrote:
> 
> Um, firstly that xml contains personal information including names, addresses 
> and emails, so unless it's dummy data ... anyone know if there's a way we can 
> remove it from the list archives at least?
> 
> As to the question:
> * Excel can import xml data (and once you can get it into Excel at all it 
> should be possible to manipulate how you want) through Data > Get Data > From 
> file > XML
> * But it's very unhappy with this particular xml file. Looking at it in 
> Notepad++ you need to move the first XML declaration (currently in line 2) to 
> line 1 (above <root>), and then delete every other XML declaration.
> * Then Excel will load it, but of course you need to help it expand out the 
> data you want from all the nestings. When you get the option after loading 
> it, click on Transform data and you'll get to Power Query Editor.
> 
> At that point it's beyond me but there are probably tutorials online or you 
> could play around with it until it submits, depend on your preference! Or 
> someone may know of a friendlier tool.
> 

Once you got a valid XML file (using the steps mentioned by Deborah above) 
another option is OpenRefine (instead of Excel). OpenRefine will import the XML 
without any additional steps and put it into a tabular format for you - the way 
it manages this is to spread a single record across multiple rows in a data 
grid to deal with the hierarchical nature of the original data (in this case 
there isn’t a huge amount of hierarchy, but there are repeated categories). 
OpenRefine has a Records mode (where it understands that several rows of data 
make up a single record) to deal with this exact scenario.

`Once you have the data in OpenRefine you can manipulate it as you need to 
(rename columns etc.) and then export to Excel or CSV (OpenRefine supports 
both) as necessary.

There are lots of other options for converting XML to CSV - there are online 
converters and you can write code or transformations as XSLT - but for me 
OpenRefine is a nice spot between power and simplicity for working with this 
kind of data

OpenRefine download page: https://openrefine.org/download.html 
<https://openrefine.org/download.html>
OpenRefine documentation: https://docs.openrefine.org/ 
<https://docs.openrefine.org/>
OpenRefine tutorial aimed at library workers: 
https://librarycarpentry.org/lc-open-refine/ 
<https://librarycarpentry.org/lc-open-refine/> 

Owen

Reply via email to