Here's a very na�ve approach, which will probably work. Might screw up your
<PRE> sections though...
perl -p0
s/
>/>
/g;s/^[^>]{1,69}[>"]/join$",split' ',$&/mge
Including an end of line match might make it a little more resilient:
^[^>]{1,69}[>"]$
It puts the attributes with the entity, but that looks right to me... i.e.
you get:
<META NAME="GENERATOR">
Instead of
<META
NAME="GENERATOR">
Greg
-----Original Message-----
From: Phil Carmody [mailto:[EMAIL PROTECTED]
Sent: Friday, October 10, 2003 1:34 PM
To: [EMAIL PROTECTED]
Subject: HTML de-uglifier in 2 lines of perl
#!/usr/bin/perl -n
chomp;if($#p>=0&&s/^(\"?>)//){$p[-1].="$1\n";print(join($w<70?'
':"\n",@p));@p=($_);$w=0}
[EMAIL PROTECTED],$_}$w+=length;}{print(join("\n",@p))if($#p>=0);
I wrote that because docbook2html produces ugly HTML:
<<<
<HTML
><HEAD
><TITLE
>A World Wide Web Interface to CTAN</TITLE
><META
NAME="GENERATOR"
CONTENT="Modular DocBook HTML Stylesheet Version 1.76b+
"></HEAD
><BODY
...
>>>
and I wanted (IMHO) prettier HTML:
<<<
<HEAD>
<TITLE>
A World Wide Web Interface to CTAN</TITLE>
<META
NAME="GENERATOR"
CONTENT="Modular DocBook HTML Stylesheet Version 1.76b+">
</HEAD>
<BODY
...
>>>
The script also tries to join multiple attributes onto the same line,
as long as the line wouldn't be too long (70 chars) as I also find that
improves the readability of HTML (by reducing the noise level).
As I suck at perl, I reckon that something only half the length of that
might be possible.
Don't spend more than 2 minutes on it. I didn't!
Phil
------------------------------------------------------------------------------
This message is intended only for the personal and confidential use of the designated
recipient(s) named above. If you are not the intended recipient of this message you
are hereby notified that any review, dissemination, distribution or copying of this
message is strictly prohibited. This communication is for information purposes only
and should not be regarded as an offer to sell or as a solicitation of an offer to buy
any financial product, an official confirmation of any transaction, or as an official
statement of Lehman Brothers. Email transmission cannot be guaranteed to be secure or
error-free. Therefore, we do not represent that this information is complete or
accurate and it should not be relied upon as such. All information is subject to
change without notice.