Hi All. I'm trying to either strip everything outside the <body> tags or match everything inside them for three nearly identical web pages. I need the content of the pages minus header/footer.
Ultimately I need to glue them all together into one valid html doc but that's for later. In general, should I be regexing for the stuff I want or the stuff I don't want? Does it matter which? If I "strip" out the html I don't want, do I save a step when it comes to the next phase, operating on the remaining content? --------------------- stripping method (not working) ---------------------- my @urls = qw( 'http://colomanager/saved/1358381265931.htm' 'http://colomanager/saved/1375561135115.htm' 'http://colomanager/saved/1388446037003.htm' ); foreach my $u (@urls) { get "$u\n"; while ($u =~ m/(.*?)<html>.*<body>(.*)/ /(.*?)<\/body>.*<\/html>(.*)/); print "$1\n"; -------------------------- end stripping method ------------------------ # The above does not compile. -------------------------- extracting (it compiles) -------------------- my @urls = qw( 'http://colomanager/saved/1358381265931.htm' 'http://colomanager/saved/1375561135115.htm' 'http://colomanager/saved/1388446037003.htm' ); foreach my $u (@urls) { get "$u\n"; $u =~ /^<div .*<\/div>$/; # What next? Where does the extracted stuff go? How do I acess it? } -- Monkeys are superior to men in this: when a monkey looks into a mirror, he sees a monkey. -- Malcolm De Chazal ----- End forwarded message ----- -- Monkeys are superior to men in this: when a monkey looks into a mirror, he sees a monkey. -- Malcolm De Chazal -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/