>>>>> "lm" == lan messerschmidt <lan.messerschm...@gmail.com> writes:

  lm> On Mon, Nov 30, 2009 at 12:40 PM, Julia Gallardo Lomeli
  lm> <juliaaa...@gmail.com> wrote:
  >> Hi,
  >> 
  >> Lets say I want grep to find all <div
  >> class="photo">whateverElementGoesInsideTheseDivTags</div> in index.html
  >> 
  >> I am using the code below but it seems that it's not working
  >> 
  >> 
  >> grep -o "<div class=\"photo\">[^()]*</div>"  index.html
  >> 

  lm> Perl's grep can do:

  lm> # cat 1.html
  lm> <div class="photo">whateverElementGoesInsideTheseDivTags</div>
  lm> <div class="photo">(wha)teverElementGoesInsideTheseDivTags</div>
  lm> <div class="image">whateverElementGoesInsideTheseDivTags</div>
  lm> <div class="css">whateverElementGoesInsideTheseDivTags</div>

  lm> # perl -e '@found=grep { m|<div class="photo">[^()]*</div>| } <>;
  lm> print @found' 1.html
  lm> <div class="photo">whateverElementGoesInsideTheseDivTags</div>

what is [^()]* looking for? why couldn't () be inside the div tags?

use perl to its best advantage if you are going to do simple stuff like
that.

        perl -ne 'print if m{<div class="photo">.*?</div>}'

now that still has issues as the html divs could span multiple
lines. that only works on divs which are in one line. and it doesn't
handle nested divs at all either. in general parsing html should be done
by a parser. the main exceptions are where you control the html either
by generating or managing it.

and i still want to know why the OP posted a shell/grep question in a
perl beginner's list. i have a feeling i won't get an answer.

uri

-- 
Uri Guttman  ------  u...@stemsystems.com  --------  http://www.sysarch.com --
-----  Perl Code Review , Architecture, Development, Training, Support ------
---------  Gourmet Hot Cocoa Mix  ----  http://bestfriendscocoa.com ---------

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to