On 27/01/2011 4:43 PM, Per Jessen wrote:
Lawrence @ Rogers wrote:
On 27/01/2011 4:15 AM, Per Jessen wrote:
I've just been looking at a mail that got a hit on
HTML_TAG_BALANCE_HEAD due to this:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/1999/REC-html401-19991224/loose.dtd"> <html
xmlns="http://www.w3.org/1999/xhtml"> <head/>
<body style="width: 800px">
I can't quite figure out whether the short tag syntax is allowed -
the HTML above was generated by XSLT based on this input:
<head></head>
Other "popular" short tags:<br/> <div/> <p/> - I don't think we
should be judging those to be unbalanced HTML tags.
/Per Jessen, Zürich
As a person who writes HTML/XHTML every single day, there are several
flaws in your argument:
-<head/> is not valid HTML or XHTML (in any version)
Ah, because it needs at least<title>. Okay.
- HTML 4.01 Transitional doesn't allow for an XHTML xmlns attribute,
nor does it permit "short tags"
Irrelevant for this issue. Spamassassin doesn't care about the DTD when
it's evaluating for unbalanced tags. Use your imagtion and put any
suitable DTD instead.
- The only "valid" short tag that you mentioned is<br />.<div/> and
<p/> are not
They're certainly all valid in XHTML. (the validator at w3c says ok for
both).
- Using a short tag without a space between the name and the / is also
not recommended as it causes problems for older browsers and poorly
written HTML parsers.
Irrelevant for this issue.
You appear to have made a flawed statement based upon a flawed study
Gee, what's with the hostility? I never made an argument, I asked a
simple question.
(no HTML e-mail will ever be just a<head></head> combination)
I didn't suggest that.
/Per Jessen, Zürich
Hi Per,
I did not intend for my message to be hostile in any way. My apologies
if my terse tone came across that way.
<div/> and <p/> may pass the validator, but that is most certainly a
bug. A quick look through the XHTML 1.0 DTD's reveals only ten tags that
may be closed using the short form, and I am unable to find any
documentation on the W3C web site to support anything otherwise.
<area />
<base />
<br />
<col />
<hr />
<img />
<input />
<link />
<meta />
<param />
Using any other shorthandled elements would result in HTML rendering
engines choking and giving unpredictable results.
What I was suggesting is that your belief is flawed because your test
was flawed itself. No e-mail will ever be just <head></head>. Ignoring
the fact that a <title> tag is required as a minimum (although many
e-mails probably omit it), the <head/> form is invalid as well.
SpamAssassin may not care about DTDs and the like, but HTML rendering
engines such as the one used in Internet Explorer (where people may be
using webmail clients) and Outlook (which recently reverted from IE's
engine to a crappy one used in Microsoft Word) do care. Programs who
send HTML e-mails are going to do at least the bare minimum to ensure
their messages are displayed and readable, and they will know that
Internet Explorer's HTML rendering engine is what will most likely be
parsing the HTML they supply. This almost ensures that a HTML message
will be at least like this
<html>
<head></head>
<body>
Some content here
</body>
</html>
Even spammers know that using anything less than the above runs a very
real risk of the message being unable to be displayed, which would make
the e-mail completely pointless.
I believe that the behavior of HTML_TAG_BALANCE_HEAD is valid in this
case, as <head/> is invalid HTML (despite what the validator says) and
should not be used by anyone.
Regards,
Lawrence
(For what it's worth, <div/> and <p/> are not "popular". I've never seen
them used on any legit site)