-----Original Message----- From: Johnstone, Colin Sent: Monday, 13 October 2003 10:23 AM To: 'Wiggins d'Anconia' Subject: RE: Reg ex help
Thank you for responding to my Query. As a newbie this was the best way I could think of tackling this problem but would like to learn to do it the right way. Maybe you could steer me in the right direction. On each press release we publish on our website, I want to provide a print this page link. As a learning exercise I want to write a cgi to pull the content from the HTML to display on a popup only that content that is necessary for printing. The heading the date and the body of the story and the authors byline. So when our cms produces this page I add to the presentation templates these tag sets around the appropriate content so when generated the final press release has the following tag sets in the html around the appropriate content. <date></date> <headline></headline> <byline></byline> <story></story> It is my intention to pass to the script the path to the file to be printed. This is what I have come up with already. #!/usr/bin/perl print "Content-Type: text/html\n\n"; $physicalPath = "/web/schooled/www/"; # Test to see if there is data in the query String if( length( $ENV{'QUERY_STRING'} ) <= 0){ print "<p>Invalid Attempt to use this application</p>"; } else{ @pairs = split(/&/, $ENV{'QUERY_STRING'}); foreach $pair ( @pairs ){ # Split the pair up into individual variables. local($name, $value) = split(/=/, $pair); # If they try to include server side includes, erase them, so they # aren't a security risk if the html gets returned. Another # security hole plugged up. $value =~ s/<!--(.|\n)*-->//g; if($name ne "path"){ print "<p>Invalid Attempt to use this application</p>"; } else{ # so weve got this far $pageToPrint = $physicalPath . $value; if(! -e $pageToPrint){ print "file doesn't exist"; } else{ open (IN, "<$pageToPrint") or die("Cannot Open: $!"); while( my $record = <IN> ){ chomp $record; $record =~ /<date>(.*?))<\/date>/; print $1; } close IN; } } } } I could get the cms to produce a printer friendly page as well, but this won't help me learn cgi concepts. If you can point me in the right direction I'd appreciate it. -----Original Message----- From: Wiggins d'Anconia [mailto:[EMAIL PROTECTED] Sent: Saturday, 11 October 2003 2:37 PM To: Johnstone, Colin Cc: [EMAIL PROTECTED] Subject: Re: Reg ex help Johnstone, Colin wrote: > Gidday All, > > I am writing a print this page script. > > I have slurped in the page to be printed and now want to strip out the > stuff to print. > > To do this I have created the following tag sets in the html page. > > <date></date> > <headline></headline> > <story></story> > > I need to write a regex to achieve this. > In general attempting to write regexes to parse HTML is a bad idea. Now having said that... Assuming you are only after these three sets of tags and they have no attributes, and I am assuming they have some content that you want to grab, you can start out simple and work your way up from there.... /<date>(.*)<\/date>/ This has a number of problems, but might work. What have you tried, what does an actual example of the data look like, and how are you handling the actual data (aka looping using a while <FH>? foreach over a split, etc.)?? Give us more, and we will reply in kind.... http://danconia.org ********************************************************************** This message is intended for the addressee named and may contain privileged information or confidential information or both. If you are not the intended recipient please delete it and notify the sender. ********************************************************************** -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]