On Wed, 22 Jan 2003, Rob Dixon wrote:

> Hi George. I think you'd have had an answer by now if there was
> one. I can't think of anything but I wasn't willing to post and say
> 'it can't be done' without waiting for others' ideas.
>
> George P. wrote:
> > But now, I need to check for all classes other than "text";
> >
> > This has me stumped!!
> >
> > For eg:
> > $str = '<TD class="text1">';
> >
> > $class = 'text[0-9]+'
> > if ($str =~ /class="$class"/)
> > {
> > print "TAG has this class\n";
> > }
>
> I still don't understand exactly why you cant use
>
>     if ($str !~ /class="$class"/) {
>         print "TAG doesn't have this class\n";
>     }
>
> or even something ugly like
>
>     if ($str =~ /class="$class"/) { }
>     else {
>         print "TAG doesn't have this class\n";
>     }
>
> If you can describe to us a circumstance where you 'need' this
> functionality I'm sure we'll come up with an answer.
>

I'll explain what I'm trying to do???

I'm writing a program that will parse an HTML file.
This html file contains a text article that has been placed in
between certain tags like (<TD>) which have a specific class name.

So you can have something like
<TABLE>
  <TR>
    <TD class="articletext">
        This article is just an example.
        </TD>
  </TR>
</TABLE>

And, I have to pick "This article is just an example" from that
file.

What class name to pick differs in different files. So, although
I have to pick all text within a TD tag having class name
"articletext" for the previous example. I might have to pick
all text within a SPAN tag having class name "anotherarticletext"
in another HTML file.

What class name to pick is decided by what file I'm parsing.

So, what did I do??
I created a map file. This map file will contain the filename,
and the tag-class combination which I have to pick.

I then read the file, and checked if it has that tag-class
combination. If it does I get the text that falls within
that tag.

Assuming, $str contains a tag specification.
$str = "<TD class='articletext'>";

In order to check if that tag-class combination exists.
I simply do:
if ($str =~ /<$tag class='$class'>/i)
        { # Take the text }
else
        { # Don't take the text }

This code helps a lot when I want to pick up a specific type
of class, like all those classes which start with the word
"text" and have a number following it.
This way the class name given in my map file will be "text[0-9]+"

Other than this, I wanted to also remove a few tags-class
combination that come in between the tags that I want to pick up.
Eg:
<TD class="articletext">
  This text has to be picked up
  <SPAN class="removetext">
    This text has to be ignored
  </SPAN>
  This text has to also be picked up.
</TD>

So I wrote a similiar code to find those tags that I want to
remove, and if that tag-class combination matches, I ignore
them.

This code works fine, when you give proper classnames, and also
works for regex class names like "text[0-9]+"
But now, one more situation arose. I want to remove all classes
other than the pick-up class.

So, if I'm picking up text from class "articletext", I want to
remove all classes other than "articletext".

I wanted to use the current code setup, just change the removing
class name to something like "[^(articletext)]" , and expect it to
remove all classes other than "articletext", but this cannot happen.

I think I'll just add one more parameter in the map file, which
will tell me when to use "=~" and when to use "!~".



Thanks for your help.

bye,
George .



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to