Extracting Structure from HTML using Adam's dom.d

Nordlöw Wed, 21 Jan 2015 15:35:30 -0800

I'm trying to figure out how to most easily extract structuredinformation using Adam D Ruppe's dom.d.


Typically I want the following HTML example

...

<h2> <span class="mw-headline" id="H2_A">More important</span></h2>

<p>This is <i>important</i>.</p>

<h2> <span class="mw-headline" id="H2_B">Less important</span></h2>

<p>This is not important.</p>
...

to be reduced to

This is <i>important</i>.

This means that I need some kind of interface to extract all thecontents of each <p> paragraph that is preceeded by a <h2>heading with a specific id (say "H2_A") or content (say "Moreimportant"). How do I accomplish that?

Further, is there a way to extract the "contents" only of anElement instance, that is "Stuff" from "<p>Stuff</p>" for eachElement in the return of for example getElementsByTagName(`p`)?

Extracting Structure from HTML using Adam's dom.d

Reply via email to