Perhaps this library can help: https://github.com/google/gumbo-parser
It should be reasonably easy to call it from GNU APL. Den tis 23 feb. 2021 20:09Blake McBride <blake1...@gmail.com> skrev: > If I were parsing HTML, I would have an exception list that contained the > few tags that don't have closing tags. I wouldn't expect a closing tag for > those. If I did get one, I'd ignore it. > > There is a very small and fixed number of these exceptional tags. Custom > tags should have closing tags. > > --blake > > > On Tue, Feb 23, 2021 at 6:01 AM Dr. Jürgen Sauermann < > mail@jürgen-sauermann.de> wrote: > >> Hi Blake, >> >> You're correct. Another problem in HTML is unquoted attribute values in >> HTML tags. >> >> I should have said "One can use ⎕XML for decoding HTML pages and the >> like as long as >> they obey the fundamental XML encoding rules". >> >> I believe it would be possible to make ⎕XML tolerate some of these HTML >> quirks, >> but I wonder if it is worth the effort. >> >> Best Regards, >> Jürgen >> >> >> >> On 2/22/21 10:11 PM, Blake McBride wrote: >> > Some of those "optional" end tags are not optional at all. It's not >> > HTML if it's there. For example: >> > >> > <br></br> is not HTML. >> > >> > --blake >> > >> > >> > >> > On Mon, Feb 22, 2021 at 1:21 PM Dr. Jürgen Sauermann >> > <mail@jürgen-sauermann.de <mailto:mail@j%C3%BCrgen-sauermann.de>> >> wrote: >> > >> > Hi, >> > >> > as far as I understand it, HTML has almost the same format as XML >> > (the main difference being optional end tags in >> > HTML which are mandatory in XML. I would assume that ⎕XML can do >> > the decoding of common web interfaces >> > like the REST API or other XML based queries quite well. Fetching >> > of the data can be done with ⎕FIO[32 ff.] so >> > the combination of them should almost do the job. >> > >> > Best Regards, >> > Jürgen >> > >> > >> > On 2/22/21 4:13 PM, Elias Mårtenson wrote: >> >> This could be quite useful when collecting data from a web site. >> >> For example, pull in a table of numbers from a Wikipedia page. >> >> Google Docs has this feature already and it can be quite useful. >> >> >> >> Regards, >> >> Elias >> >> >> >> On Mon, 22 Feb 2021 at 22:26, Chris Moller <mol...@mollerware.com >> >> <mailto:mol...@mollerware.com>> wrote: >> >> >> >> Sounds like another native function! :-) >> >> >> >> Maybe after I finish my current project... >> >> >> >> On 2/22/21 5:26 AM, Hans-Peter Sorge wrote: >> >>> Hi, >> >>> >> >>> I would modify the data model and/or process graph or use an >> >>> adequate programming language. >> >>> In my opinion, having to rely on data content to control >> >>> program flow is 'costly'. >> >>> (My be one reason too, that APL has no language specific >> >>> regular expressions). >> >>> >> >>> My highest priority for APL would be the mapping between an >> >>> apl name and a file, >> >>> directory, a db-table, a spread sheet or an editor instance. >> >>> >> >>> APL was designed to contain code and data in a 'closed' >> >>> workspace. >> >>> Those days data entry was done by human nature - into the >> >>> work space. >> >>> Nowadays I get the data very likely from somewhere outside >> >>> of the workspace. >> >>> ⍎ ')host' and piping are already a big help here. >> >>> >> >>> But for example analyzing a web page, that is being done >> >>> faster in python. >> >>> Having a proper infrastructure in APL, like >> >>> *page ← ⎕curl '...url...' ** >> >>> **page['head';'link' ] * >> >>> could return all link tags. - just dreaming:-) >> >>> >> >>> However - please no if/then/else >> >>> >> >>> Best Regards >> >>> Hans-Peter >> >>> >> >>> >> >>> Am 20.02.21 um 19:59 schrieb Christian Robert: >> >>>> well I saw the new thrends aka Quad-XML, Quad-JSON, >> >>>> Quad-FFT and so on >> >>>> >> >>>> but I think thoses will never be used in real life or quite >> >>>> seldom. >> >>>> >> >>>> I really think that Juergen should be looking at >> >>>> >> >>>> :if/:elseif/:else/:endif >> >>>> >> >>>> :for var :in array >> >>>> loop >> >>>> :endfor >> >>>> >> >>>> :while condition: >> >>>> loop >> >>>> :endwhile >> >>>> >> >>>> :do >> >>>> loop >> >>>> :until condition >> >>>> >> >>>> this will eases newcommers to the language. >> >>>> >> >>>> I know that APL goal is to do a whole "program" in one or >> >>>> two lines of code... >> >>>> but the language must accomodate newcommers. >> >>>> >> >>>> I asked for that several years ago (may me 8 or 10 years) >> >>>> >> >>>> Juergen ansewered at that time "this can be done" but I >> >>>> wont yet >> >>>> >> >>>> well my principal next improvements wish list is >> >>>> if/for/while/do_until >> >>>> >> >>>> my real though, >> >>>> >> >>>> Xtian. >> >>>> >> >>> >> >> >> > >> >>