volks,
brief prefix. I believe Li Ngok Lam has found a clear 'issue' in the original request for solving a regex problem.
my working assumption was that the OP needed a filter that would clean up a bunch of pre-existing static *.html files because the site had adopted a new scheme, and so these older pages would merely need to be 'cleaned'....
But since some here may also have scratched their heads at the
original request let's step aside for a moment and look at some of the issues....
On Friday, Mar 21, 2003, at 09:05 US/Pacific, Li Ngok Lam wrote: [..]
Anyway, the bgcolor can be formed or change again via javascript or CSS.
I mean, blocking bgcolor in body tag cannot solve your potential problem.
This of course is the 'critical kill' in the OP's problem. In terms of trying to 'control it all' from some CGI script that is 'generating' web pages given various 'input streams'. { hey, we all started some place. And figured out our better ways along the way... }
Let's deal with the CSS/SSI side plays first, as the javascript side is modestly easier to solve.
There are CSS as well as various SSI directives, which, were we to seek completeness would require that a much more complex parser be in play, since it would need to deal with each of them in turn - and DOING the 'resolve in place' - eg given
<head> <meta http-equiv="content-type" content="text/html;charset=ISO-8859-1"> <title>Welcome</title> <link href="../CSS/sitewide.css" rel="stylesheet" media="screen"> </head>
the parser would need to grot through the *.css file and resolve if there is any bgcolor components, if clean, let it stay, otherwise that part of the text would need to be reconstructed and pushed into the data stream:
<html><head><title> Welcome </title>
<style>
<!--
body { font-family: Arial, Helvetica, Geneva, Swiss, SunSans-Regular }
p { font-size: 12px; font-family: Arial, Helvetica, Geneva, Swiss, SunSans-Regular }
td { font-size: 12px; font-family: "Times New Roman", Georgia, Times }
element { }
//-->
</style>
</head>
We of course would not need to put the static 'content-type' in a dynamic stream back to the web browswer, since as a perl CGI script, we of course need to send out the
print "Content-Type: text/html;charset=ISO-8859-1 $CR$LF"
anyway, right???
But you may find someway to put this in your body tag : background="white_block.jpg",
while we are proposing the idea of replacing, it is important to remember that the 'background' attribute is 'acceptable' in more than just a body tag... But you probably would not want to ship a src such as a jpg file in the process if all you really want to do is redefine to say white eg:
bgcolor="#ffffff"
the RegEx I proposed would of course remove the string
background="white_block.jpg"
from any 'input' provided since it really does not care about whether those are alpha-numeric, or not, since it was designed to remove the stuff after the ="" as it were...
[..]as wallpaper goes upper than bgcolor or using javascript : document.bgColor='ff0000'; // not sure if this run on NS too
this part of the problem is where one needs to expand the RegEx as well, so that one deals with the possible contamination in a javascript element, most likely triggered by the 'onload'...
But the 'patterns'
document.bgColor document.background
etc, could likewise be 'targetted' for conversion, on the fly, and/or 'in place' with the same type of filtering with an appropriate RegEx.
The trick in those cases of course is that javascript allows white space on either side of the "=" so one is looking at the problem of
$line =~ s/document.bgColor\s*=\s*(["']?)([^"^'\s]+)(["']?)\s*(;?)//gi ;
in this case, since single or double quotes would be possible....
HTH.
-- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]