volks,


brief prefix. I believe Li Ngok Lam has found a clear
'issue' in the original request for solving a regex problem.

my working assumption was that the OP needed a filter that
would clean up a bunch of pre-existing static *.html files
because the site had adopted a new scheme, and so these older
pages would merely need to be 'cleaned'....

But since some here may also have scratched their heads at the
original request let's step aside for a moment and look at some of the issues....


On Friday, Mar 21, 2003, at 09:05 US/Pacific, Li Ngok Lam wrote:
[..]
Anyway, the bgcolor can be formed or change again via javascript or CSS.
I mean, blocking bgcolor in body tag cannot solve your potential problem.

This of course is the 'critical kill' in the OP's problem. In terms of trying to 'control it all' from some CGI script that is 'generating' web pages given various 'input streams'. { hey, we all started some place. And figured out our better ways along the way... }

Let's deal with the CSS/SSI side plays first, as the
javascript side is modestly easier to solve.

There are CSS as well as various SSI directives, which,
were we to seek completeness would require that a much
more complex parser be in play, since it would need to
deal with each of them in turn - and DOING the 'resolve
in place' - eg given

<head>
<meta http-equiv="content-type" content="text/html;charset=ISO-8859-1">
<title>Welcome</title>
<link href="../CSS/sitewide.css" rel="stylesheet" media="screen">
</head>

the parser would need to grot through the *.css file and
resolve if there is any bgcolor components, if clean,
let it stay, otherwise that part of the text would need
to be reconstructed and pushed into the data stream:

<html><head><title> Welcome </title>
<style>
<!--
body { font-family: Arial, Helvetica, Geneva, Swiss, SunSans-Regular }
p { font-size: 12px; font-family: Arial, Helvetica, Geneva, Swiss, SunSans-Regular }
td { font-size: 12px; font-family: "Times New Roman", Georgia, Times }
element { }
//-->
</style>
</head>


We of course would not need to put the static 'content-type'
in a dynamic stream back to the web browswer, since as a
perl CGI script, we of course need to send out the

print "Content-Type: text/html;charset=ISO-8859-1 $CR$LF"

anyway, right???

But you may find someway to put this in your body tag :
background="white_block.jpg",

while we are proposing the idea of replacing, it is important to remember that the 'background' attribute is 'acceptable' in more than just a body tag... But you probably would not want to ship a src such as a jpg file in the process if all you really want to do is redefine to say white eg:

bgcolor="#ffffff"

the RegEx I proposed would of course remove the string

background="white_block.jpg"

from any 'input' provided since it really does not
care about whether those are alpha-numeric, or not,
since it was designed to remove the stuff after the =""
as it were...

as wallpaper goes upper than bgcolor or using javascript :
document.bgColor='ff0000'; // not sure if this run on NS too
[..]

this part of the problem is where one needs to expand the
RegEx as well, so that one deals with the possible contamination
in a javascript element, most likely triggered by the 'onload'...

But the 'patterns'

        document.bgColor
        document.background

etc, could likewise be 'targetted' for conversion, on the
fly, and/or 'in place' with the same type of filtering
with an appropriate RegEx.

The trick in those cases of course is that javascript
allows white space on either side of the "=" so one is
looking at the problem of

$line =~ s/document.bgColor\s*=\s*(["']?)([^"^'\s]+)(["']?)\s*(;?)//gi ;

in this case, since single or double quotes would be possible....

HTH.




-- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to