Re: [dev] suckless html to markdown (text)

2019-01-06 Thread Fischers Fritz
Although implementations usually get this wrong, Markdown is supposed to be an extension of HTML; that is, any HTML document is also a Markdown document. Consequently, you can use cat(1) to convert. cat webpage.html > webpage.md You likely want also to remove some of the HTML tags and use the M

Re: [dev] suckless html to markdown (text)

2019-01-06 Thread Nick
Quoth Alexander Krotov: > > Ideally, with sed/awk, or better in C. > > "Parsing" HTML with sed is simply wrong. This is a good point that I should have mentioned. I spent years using sed and awk to extract things from HTML, writing crawlers and suchlike, for personal projects. It can work, of c

Re: [dev] suckless html to markdown (text)

2019-01-06 Thread Alexander Krotov
> Ideally, with sed/awk, or better in C. "Parsing" HTML with sed is simply wrong. You need to use a decent HTML parsing library, as parsing HTML is complex. There is https://github.com/yujiahaol68/downmark that uses Go html library, but I have not tried it. Seriously though, if you are not g

Re: [dev] suckless html to markdown (text)

2019-01-05 Thread ssd
I'm afraid pandoc won't be considered suckless by most of the list, but I would double Nick's recommendation: pandoc is the only tool that eventually worked reliably for my tasks. Escpecially in corporative environment, I appreciate that I can convert accross formats,even to docx and import to / e

Re: [dev] suckless html to markdown (text)

2019-01-04 Thread Nick
Hi Thuban, Quoth Thuban: > I'm looking for a suckless html to markdown (or text) tool. > Ideally, with sed/awk, or better in C. pandoc seems to always do a reasonable job - I use it daily for this. It's written in haskell, which may not fit your definition of suckless, but it is widely used

Re: [dev] suckless html to markdown (text)

2019-01-02 Thread Calvin Morrison
On Tue, 1 Jan 2019 at 13:33, Thuban wrote: > > Hi, > I'm looking for a suckless html to markdown (or text) tool. > Ideally, with sed/awk, or better in C. > > Any idea? > > Regards > -- > thuban > Not relevant but here is a md2html awk script I have used in the past: https://github.com/wlan

[dev] suckless html to markdown (text)

2019-01-01 Thread Thuban
Hi, I'm looking for a suckless html to markdown (or text) tool. Ideally, with sed/awk, or better in C. Any idea? Regards -- thuban