FW: Reg ex help

Johnstone, Colin Sun, 12 Oct 2003 17:43:26 -0700


-----Original Message-----
From: Johnstone, Colin 
Sent: Monday, 13 October 2003 10:23 AM
To: 'Wiggins d'Anconia'
Subject: RE: Reg ex help

Thank you for responding to my Query.

As a newbie this was the best way I could think of tackling this problem
but would like to learn to do it the right way. Maybe you could steer me
in the right direction.

On each press release we publish on our website, I want to provide a
print this page link.

As a learning exercise I want to write a cgi to pull the content from
the HTML to display on a popup only that content that is necessary for
printing. The heading the date and the body of the story and the authors
byline.

So when our cms produces this page I add to the presentation templates
these tag sets around the appropriate content so when generated the
final press release has the following tag sets in the html around the
appropriate content.

<date></date>
<headline></headline>
<byline></byline>
<story></story>

It is my intention to pass to the script the path to the file to be
printed. This is what I have come up with already.

#!/usr/bin/perl

print "Content-Type: text/html\n\n";

$physicalPath = "/web/schooled/www/";

# Test to see if there is data in the query String
if( length( $ENV{'QUERY_STRING'} ) <= 0){
    print "<p>Invalid Attempt to use this application</p>";
}
else{
    @pairs = split(/&/, $ENV{'QUERY_STRING'});

    foreach $pair ( @pairs ){

      # Split the pair up into individual variables.
      local($name, $value) = split(/=/, $pair);

      # If they try to include server side includes, erase them, so they
      # aren't a security risk if the html gets returned.  Another
      # security hole plugged up.
      $value =~ s/<!--(.|\n)*-->//g;

      if($name ne "path"){
         print "<p>Invalid Attempt to use this application</p>";
      }
      else{
      # so weve got this far
        $pageToPrint = $physicalPath . $value;

        if(! -e $pageToPrint){
          print "file doesn't exist";
        }
        else{
          open (IN, "<$pageToPrint") or die("Cannot Open: $!");
          while( my $record = <IN> ){
            chomp $record;
            $record =~ /<date>(.*?))<\/date>/;
            print $1;
          }
          close IN;
        }

      }

    }

}

I could get the cms to produce a printer friendly page as well, but this
won't help me learn cgi concepts. 

If you can point me in the right direction I'd appreciate it.

-----Original Message-----
From: Wiggins d'Anconia [mailto:[EMAIL PROTECTED] 
Sent: Saturday, 11 October 2003 2:37 PM
To: Johnstone, Colin
Cc: [EMAIL PROTECTED]
Subject: Re: Reg ex help

Johnstone, Colin wrote:
> Gidday All,
> 
> I am writing a print this page script.
> 
> I have slurped in the page to be printed and now want to strip out the
> stuff to print.
> 
> To do this I have created the following tag sets in the html page.
> 
> <date></date>
> <headline></headline>
> <story></story>
> 
> I need to write a regex to achieve this.
> 

In general attempting to write regexes to parse HTML is a bad idea. Now 
having said that...

Assuming you are only after these three sets of tags and they have no 
attributes, and I am assuming they have some content that you want to 
grab, you can start out simple and work your way up from there....

/<date>(.*)<\/date>/

This has a number of problems, but might work.  What have you tried, 
what does an actual example of the data look like, and how are you 
handling the actual data (aka looping using a while <FH>? foreach over a

split, etc.)??

Give us more, and we will reply in kind....

http://danconia.org

**********************************************************************
This message is intended for the addressee named and may contain
privileged information or confidential information or both. If you
are not the intended recipient please delete it and notify the sender.
**********************************************************************

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

FW: Reg ex help

Reply via email to