Perhaps this library can help: https://github.com/google/gumbo-parser

It should be reasonably easy to call it from GNU APL.

Den tis 23 feb. 2021 20:09Blake McBride <blake1...@gmail.com> skrev:

> If I were parsing HTML, I would have an exception list that contained the
> few tags that don't have closing tags.  I wouldn't expect a closing tag for
> those.  If I did get one, I'd ignore it.
>
> There is a very small and fixed number of these exceptional tags.  Custom
> tags should have closing tags.
>
> --blake
>
>
> On Tue, Feb 23, 2021 at 6:01 AM Dr. Jürgen Sauermann <
> mail@jürgen-sauermann.de> wrote:
>
>> Hi Blake,
>>
>> You're correct. Another problem in HTML is unquoted attribute values in
>> HTML tags.
>>
>> I should have said "One can use ⎕XML for decoding HTML pages and the
>> like as long as
>> they obey the fundamental XML encoding rules".
>>
>> I believe it would be possible to make ⎕XML tolerate some of these HTML
>> quirks,
>> but I wonder if it is worth the effort.
>>
>> Best Regards,
>> Jürgen
>>
>>
>>
>> On 2/22/21 10:11 PM, Blake McBride wrote:
>> > Some of those "optional" end tags are not optional at all.  It's not
>> > HTML if it's there.  For example:
>> >
>> > <br></br>    is not HTML.
>> >
>> > --blake
>> >
>> >
>> >
>> > On Mon, Feb 22, 2021 at 1:21 PM Dr. Jürgen Sauermann
>> > <mail@jürgen-sauermann.de <mailto:mail@j%C3%BCrgen-sauermann.de>>
>> wrote:
>> >
>> >     Hi,
>> >
>> >     as far as I understand it, HTML has almost the same format as XML
>> >     (the main difference being optional end tags in
>> >     HTML which are mandatory in XML. I would assume that ⎕XML can do
>> >     the decoding of common web interfaces
>> >     like the REST API or other XML based queries quite well. Fetching
>> >     of the data can be done with ⎕FIO[32 ff.] so
>> >     the combination of them should almost do the job.
>> >
>> >     Best Regards,
>> >     Jürgen
>> >
>> >
>> >     On 2/22/21 4:13 PM, Elias Mårtenson wrote:
>> >>     This could be quite useful when collecting data from a web site.
>> >>     For example, pull in a table of numbers from a Wikipedia page.
>> >>     Google Docs has this feature already and it can be quite useful.
>> >>
>> >>     Regards,
>> >>     Elias
>> >>
>> >>     On Mon, 22 Feb 2021 at 22:26, Chris Moller <mol...@mollerware.com
>> >>     <mailto:mol...@mollerware.com>> wrote:
>> >>
>> >>         Sounds like another native function!  :-)
>> >>
>> >>         Maybe after I finish my current project...
>> >>
>> >>         On 2/22/21 5:26 AM, Hans-Peter Sorge wrote:
>> >>>         Hi,
>> >>>
>> >>>         I would modify the data model and/or process graph or use an
>> >>>         adequate programming language.
>> >>>         In my opinion, having to rely on data content to control
>> >>>         program flow is 'costly'.
>> >>>         (My be one reason too, that APL has no language specific
>> >>>         regular expressions).
>> >>>
>> >>>         My highest priority for APL would be the mapping between an
>> >>>         apl name and a file,
>> >>>         directory, a db-table, a spread sheet or an editor instance.
>> >>>
>> >>>         APL was designed to contain code and data in a 'closed'
>> >>>         workspace.
>> >>>         Those days data entry was done by human nature - into the
>> >>>         work space.
>> >>>         Nowadays I get the data very likely from somewhere outside
>> >>>         of the workspace.
>> >>>         ⍎ ')host' and piping are already a big help here.
>> >>>
>> >>>         But for example analyzing a web page, that is being done
>> >>>         faster in python.
>> >>>         Having a proper infrastructure in APL, like
>> >>>         *page ← ⎕curl '...url...' **
>> >>>         **page['head';'link' ] *
>> >>>         could return all link tags. - just dreaming:-)
>> >>>
>> >>>         However - please no if/then/else
>> >>>
>> >>>         Best Regards
>> >>>         Hans-Peter
>> >>>
>> >>>
>> >>>         Am 20.02.21 um 19:59 schrieb Christian Robert:
>> >>>>         well I saw the new thrends aka Quad-XML, Quad-JSON,
>> >>>>         Quad-FFT and so on
>> >>>>
>> >>>>         but I think thoses will never be used in real life or quite
>> >>>>         seldom.
>> >>>>
>> >>>>         I really think that Juergen should be looking at
>> >>>>
>> >>>>         :if/:elseif/:else/:endif
>> >>>>
>> >>>>         :for var :in array
>> >>>>           loop
>> >>>>         :endfor
>> >>>>
>> >>>>         :while condition:
>> >>>>           loop
>> >>>>         :endwhile
>> >>>>
>> >>>>         :do
>> >>>>           loop
>> >>>>         :until condition
>> >>>>
>> >>>>         this will eases newcommers to the language.
>> >>>>
>> >>>>         I know that APL goal is to do a whole "program" in one or
>> >>>>         two lines of code...
>> >>>>         but the language must accomodate newcommers.
>> >>>>
>> >>>>         I asked for that several years ago (may me 8 or 10 years)
>> >>>>
>> >>>>         Juergen ansewered at that time "this can be done" but I
>> >>>>         wont yet
>> >>>>
>> >>>>         well my principal next improvements wish list is
>> >>>>         if/for/while/do_until
>> >>>>
>> >>>>         my real though,
>> >>>>
>> >>>>         Xtian.
>> >>>>
>> >>>
>> >>
>> >
>>
>>

Reply via email to