[Lynx-dev] Extract links from html with application/ld+json script

2023-12-17 Thread Super Bonaci via Lynx-dev
Version in use: Lynx Version 2.8.9rel.1 (08 Jul 2018) Some html pages contain 

Re: [Lynx-dev] Extract links from html with application/ld+json script

2023-12-17 Thread David Woolley
On 17/12/2023 19:31, Super Bonaci via Lynx-dev wrote: Lynx is not able to extract most html links inside the html file. There are no HTML links in 9ed7a8bb (no anchor elements, and all occurrences of href are either in link elements, which don't generate visible hyperlinks, inline, except fo

Re: [Lynx-dev] Extract links from html with application/ld+json script

2023-12-17 Thread David Woolley
Looking a bit further, ld+json is a database serialisation format, based on javascript, but it is declarative. It definitely isn't HTML, but one could render it by basically pretty printing, without the need to handle the generalities of javascript. You may, though have to manually extract it

Re: [Lynx-dev] Extract links from html with application/ld+json script

2023-12-17 Thread Thorsten Glaser
David Woolley dixit: > Lynx does not even have a JSON interpreter and I'm sure it doesn't > have a JSON pretty printer. Yeah, that’s totally out of scope. Use tools like cURL / GNU wget, sed/tidy/xmlstarlet to extract the JSON, jq to parse it, instead. bye, //mirabilos -- Support mksh as /bin/s