On 12/28/2013 05:52 PM, Shawn Wilson wrote:
> The parser has done what its supposed to. IDK you can alter the
> encoding in it. Maybe you can and that's what you're looking for
> (encoding or character set). I'd first try binmode UTF-8 but you'll
> probably just end up handling this with a regex.
The parser has done what its supposed to. IDK you can alter the encoding in it.
Maybe you can and that's what you're looking for (encoding or character set).
I'd first try binmode UTF-8 but you'll probably just end up handling this with
a regex.
"Lars Noodén" wrote:
>If there is a better list
On 07/12/2010 20:42, Jim Gibson wrote:
On 12/7/10 Tue Dec 7, 2010 12:17 PM, "shawn wilson"
scribbled:
i'm messing up somewhere along the way here...
i'm trying to get data from a table in a page which should always get
defined like this:
and i'm looking for an element of the table that l
On 12/7/10 Tue Dec 7, 2010 12:17 PM, "shawn wilson"
scribbled:
> i'm messing up somewhere along the way here...
>
> i'm trying to get data from a table in a page which should always get
> defined like this:
>
>
>
>
> and i'm looking for an element of the table that looks like this:
>
> Le
Mathew Snyder wrote:
>
> I have a script which runs WWW::Mechanize to obtain a page so it can be parsed
> for email addresses. However, I can't recall how I'm supposed to use
> HTML::TokeParser to get what I need. This is the pertinent part of the
> script:
>
> ...
> my $data = $agent->conte
Mumia W. wrote:
> On 12/19/2006 10:58 PM, Mathew Snyder wrote:
>> I have a script which runs WWW::Mechanize to obtain a page so it can be
>> parsed for email addresses. However, I can't recall how I'm supposed to use
>> HTML::TokeParser to get what I need. This is the pertinent part of the
>> sc
On 12/19/2006 10:58 PM, Mathew Snyder wrote:
I have a script which runs WWW::Mechanize to obtain a page so it can be parsed
for email addresses. However, I can't recall how I'm supposed to use
HTML::TokeParser to get what I need. This is the pertinent part of the script:
...
my $data = $ag
Ing. Branislav Gerzo [IBG], on Tuesday, November 22, 2005 at 13:42
(+0100) thinks about:
IBG> while(my $tag = $parser->get_tag('b')) {
IBG> my $text = $parser->get_text();
IBG> last if $text =~ /^(this and that|or that and this)/i;
IBG> }
IBG> my $text = $parser->get_text('b', 'b
tr") foreach (1..11);
$parser->get_tag("td");
my $lg_desc = $parser->get_text();
print "Large Description: $lg_desc\n";
Thanks again! -
Brian
> -Original Message-
> From: Charles K. Clarkson [mailto:[EMAIL PROTECTED]
> Sent: Wedn
Brian Volk <[EMAIL PROTECTED]> wrote:
: Hi All,
:
: I'm trying to only get the text from w/in a certain table in
: the HTML source. Right now I am getting all the text in the
: source.
: Here is my script I made notes in the script.
:
: #!/usr/bin/perl -w
Always use strict.
use strict
Paul Kraus wrote:
> Someone want to show me how this module can help parse out html?
>
> I want to grap text between text being able to apple regexp to
> get what I want.
>
> The problem is my text is among 10,000 td tags. With the only difference
> being what the above tag has in it.
>
> So if t
On Wednesday, Nov 26, 2003, at 12:30 US/Pacific, Paul Kraus wrote:
Someone want to show me how this module can help parse out html?
I want to grap text between text being able to apple regexp to
get what I want.
The problem is my text is among 10,000 td tags. With the only
difference
being what
Paul Kraus wrote:
> Someone want to show me how this module can help parse out html?
>
> I want to grap text between text being able to apple regexp to
> get what I want.
>
> The problem is my text is among 10,000 td tags. With the only difference
> being what the above tag has in it.
>
> So if t
Dan Muey wrote:
> That's exactly it, thanks you very much!
> One more tiny little problem,
>
> I have it grabbing the title, links and img tags perfectly except fot one minor snafu
>
> It won't grab/parse img tags that are between tags, IE an image that is a link.
> I tried having it parse 's fir
Excellent thanks! I'll look this over and give her a go!
I appreciate your time and energy
Dan
> -Original Message-
> From: david [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, February 12, 2003 3:36 PM
> To: [EMAIL PROTECTED]
> Subject: RE: HTML::TokeParser
&g
Dan Muey wrote:
> whatever is inbetween the
> I winder if it's possible to do some thing like this :
>
> if($token->[0] eq 'a'){
> print $token->[1]{href} || "what?","\n";
> my $link_guts = $tok->get_trimmed_text("/a");
>
> and then some how grab the 'src' and 'alt' attributes from eac
Thanks!!
That's what I figured and I was actually just formulating another question based on it
::
if($token->[0] eq 'a'){
print $token->[1]{href} || "what?","\n";
# print $tok->get_trimmed_text("/a"); print "\n";
}
elsif($token->[0] eq 'img') {
print $token->[1]{src} || "again?","\n";
Dan Muey wrote:
>
>
> The script you sent me does get the imagses, mine doesn't so I screwed up
> somewhere along the way. I'll take your original and modify it one step at
> a time to narrow down what I did wrong.
>
> I'll post back when I get it right so that hopefully someone can learn
> fro
The script you sent me does get the imagses, mine doesn't so I screwed up somewhere
along the way.
I'll take your original and modify it one step at a time to narrow down what I did
wrong.
I'll post back when I get it right so that hopefully someone can learn from my
dumbness.
Thanks
Dan
>
Funny, here's the script since I modified it perhaps I jacked somehting up
Also I had it checking meta tags ::
my $name = $token->[1]{name} || "-";
my $http = $token->[1]{http-equiv} || "-";
my $cont = $token->[1]{content} || "-";
That would grab content for all of them,
The name if it was a nam
Dan Muey wrote:
>
> It won't grab/parse img tags that are between tags, IE an image that
> is a link. I tried having it parse 's first then 's but that
> didn't work. Any thoughts??
> Thanks for all you r help!
>
what do you mean? the following seems to be working:
#!/usr/bin/perl -w
use stric
> Dan Muey wrote:
>
> >
> >>
> >> I am trying to use HTML::TokeParser
> >> From the cpan page for this I used this example :
> >>
> >> while (my $token = $p->get_tag("a")) {
> >> my $url = $token->[1]{href} || "-";
> >> my $text
Dan Muey wrote:
>> I am trying to use HTML::TokeParser
>> From the cpan page for this I used this example :
>>
>> while (my $token = $p->get_tag("a")) {
>> my $url = $token->[1]{href} || "-";
>> my $text = $p->get_trimmed_text("/a");
>
Dan Muey wrote:
>
>>
>> I am trying to use HTML::TokeParser
>> From the cpan page for this I used this example :
>>
>> while (my $token = $p->get_tag("a")) {
>> my $url = $token->[1]{href} || "-";
>> my $text = $p->get_trimmed_tex
>
> I am trying to use HTML::TokeParser
> From the cpan page for this I used this example :
>
> while (my $token = $p->get_tag("a")) {
> my $url = $token->[1]{href} || "-";
> my $text = $p->get_trimmed_text("/a");
>
I suppose is a browser directive, but TokeParser returns an actual
character.
Will research HTML::Entities' relationship to HTML::TokeParser
later...probably much later, because now it's working.
Dave
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTE
"John W. Krahn" <[EMAIL PROTECTED]> wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
> Rob Dixon wrote:
> >
> > "David Eason" <[EMAIL PROTECTED]> wrote in message
> > [EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
> > > I get a space in my editor output window. but when I run it from a
c
Rob Dixon wrote:
>
> "David Eason" <[EMAIL PROTECTED]> wrote in message
> [EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
> > I get a space in my editor output window. but when I run it from a cmd
> > window, I get the other character. (This is under Windows 2000 and
> perl
> > 5.8.0)
>
> Windows co
"David Eason" <[EMAIL PROTECTED]> wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
> I get a space in my editor output window. but when I run it from a cmd
> window, I get the other character. (This is under Windows 2000 and
perl
> 5.8.0)
Hi David
Windows command prompt doesn't supp
I get a space in my editor output window. but when I run it from a cmd
window, I get the other character. (This is under Windows 2000 and perl
5.8.0)
"John W. Krahn" <[EMAIL PROTECTED]> wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
> David Eason wrote:
> >
> > John W. Krahn wrote:
>
David Eason wrote:
>
> John W. Krahn wrote:
> > According to HTML::Entities
> >
> > # Some extra Latin 1 chars that are listed in the HTML3.2 draft
> > (21-May-96)
> > copy => '©', # copyright sign
> > reg=> '®', # registered sign
> > nbsp => "\240", # non breaking space
>
> Thanks,
John W. Krahn wrote:
> According to HTML::Entities
>
> # Some extra Latin 1 chars that are listed in the HTML3.2 draft
> (21-May-96)
> copy => '©', # copyright sign
> reg=> '®', # registered sign
> nbsp => "\240", # non breaking space
Thanks, John, I had no idea where to look. I did
David Eason wrote:
>
> I recreated a problem in my program in a small code sample. The code below
> is giving me the following output at the console and I have no idea why:
>
> Output:
> line 1áline 3
>
> I am seeing a lower case 'a' with an acute accent between 'line 1' and 'line
> 3'. Any idea
On Nov 25, sulfericacid said:
>#!/usr/local/bin/perl -w
>
>my %form;
>my $content;
>
>my $userurl = $form{'userurl'};
Where do you think %form gets populated? You want to use the CGI.pm
module:
use CGI 'param';
my $userurl = param('userurl');
>print "Content-type: text/html\n\n";
>
>use di
Ok, late last night I realised the semi colons were missing among a few
other small details. Since then I repaired all of them and ran a debbuger,
but it came back clean. I uploaded and tried to run it on the webserver,
but it doesn't print anything. The script doesn't display any errors or
logf
--- sulfericacid <[EMAIL PROTECTED]> wrote:
> use HTML::TokeParser
> my $p = HTML::TokeParser->new(\$content);
>
> my %meta;
> while (my $token = $p->get_token) {
> next unless $token->[1] eq 'meta' && $token->[0] eq 'S';
> $meta{$token->[2]->{name}} = $token->[2]{content};
> }
>
On Mon, 2002-11-25 at 11:30, Jeff 'japhy' Pinyan wrote:
> On Nov 24, sulfericacid said:
>
> >use LWP::Simple
> >use HTML::TokeParser
>
> You're missing semi-colons after those two 'use' statement.s
And while you're at it:
$username =~ s/e/u/
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For a
On Nov 24, sulfericacid said:
>use LWP::Simple
>use HTML::TokeParser
You're missing semi-colons after those two 'use' statement.s
--
Jeff "japhy" Pinyan [EMAIL PROTECTED] http://www.pobox.com/~japhy/
RPI Acacia brother #734 http://www.perlmonks.org/ http://www.cpan.org/
what does
At 11:43 AM 2/25/02 +1030, Daniel Falkenberg wrote:
>Hello All,
>
>Is it possible for HTML::TokeParser to be able to work like
>HTML::TreeBuilder. I don't want to have to download the webpage I just
>want to be able to view its' HTML content
Okay, I give up. How do you envisage being able to se
39 matches
Mail list logo