Re: HTML::TokeParser munging characters

2013-12-28 Thread Lars Noodén
On 12/28/2013 05:52 PM, Shawn Wilson wrote: > The parser has done what its supposed to. IDK you can alter the > encoding in it. Maybe you can and that's what you're looking for > (encoding or character set). I'd first try binmode UTF-8 but you'll > probably just end up handling this with a regex.

Re: HTML::TokeParser munging characters

2013-12-28 Thread Shawn Wilson
The parser has done what its supposed to. IDK you can alter the encoding in it. Maybe you can and that's what you're looking for (encoding or character set). I'd first try binmode UTF-8 but you'll probably just end up handling this with a regex. "Lars Noodén" wrote: >If there is a better list

Re: html::tokeparser

2010-12-07 Thread Rob Dixon
On 07/12/2010 20:42, Jim Gibson wrote: On 12/7/10 Tue Dec 7, 2010 12:17 PM, "shawn wilson" scribbled: i'm messing up somewhere along the way here... i'm trying to get data from a table in a page which should always get defined like this: and i'm looking for an element of the table that l

Re: html::tokeparser

2010-12-07 Thread Jim Gibson
On 12/7/10 Tue Dec 7, 2010 12:17 PM, "shawn wilson" scribbled: > i'm messing up somewhere along the way here... > > i'm trying to get data from a table in a page which should always get > defined like this: > > > > > and i'm looking for an element of the table that looks like this: > > Le

Re: HTML::TokeParser question

2006-12-20 Thread Rob Dixon
Mathew Snyder wrote: > > I have a script which runs WWW::Mechanize to obtain a page so it can be parsed > for email addresses. However, I can't recall how I'm supposed to use > HTML::TokeParser to get what I need. This is the pertinent part of the > script: > > ... > my $data = $agent->conte

Re: HTML::TokeParser question

2006-12-20 Thread Rob Dixon
Mumia W. wrote: > On 12/19/2006 10:58 PM, Mathew Snyder wrote: >> I have a script which runs WWW::Mechanize to obtain a page so it can be >> parsed for email addresses. However, I can't recall how I'm supposed to use >> HTML::TokeParser to get what I need. This is the pertinent part of the >> sc

Re: HTML::TokeParser question

2006-12-20 Thread Mumia W.
On 12/19/2006 10:58 PM, Mathew Snyder wrote: I have a script which runs WWW::Mechanize to obtain a page so it can be parsed for email addresses. However, I can't recall how I'm supposed to use HTML::TokeParser to get what I need. This is the pertinent part of the script: ... my $data = $ag

Re: HTML::TokeParser, get HTML

2005-11-22 Thread Ing. Branislav Gerzo
Ing. Branislav Gerzo [IBG], on Tuesday, November 22, 2005 at 13:42 (+0100) thinks about: IBG> while(my $tag = $parser->get_tag('b')) { IBG> my $text = $parser->get_text(); IBG> last if $text =~ /^(this and that|or that and this)/i; IBG> } IBG> my $text = $parser->get_text('b', 'b

RE: HTML::TokeParser::Simple / get_attr

2004-11-24 Thread Brian Volk
tr") foreach (1..11); $parser->get_tag("td"); my $lg_desc = $parser->get_text(); print "Large Description: $lg_desc\n"; Thanks again! - Brian > -Original Message- > From: Charles K. Clarkson [mailto:[EMAIL PROTECTED] > Sent: Wedn

RE: HTML::TokeParser::Simple / get_attr

2004-11-24 Thread Charles K. Clarkson
Brian Volk <[EMAIL PROTECTED]> wrote: : Hi All, : : I'm trying to only get the text from w/in a certain table in : the HTML source. Right now I am getting all the text in the : source. : Here is my script I made notes in the script. : : #!/usr/bin/perl -w Always use strict. use strict

Re: Html::tokeparser::simple

2003-11-28 Thread R. Joseph Newton
Paul Kraus wrote: > Someone want to show me how this module can help parse out html? > > I want to grap text between text being able to apple regexp to > get what I want. > > The problem is my text is among 10,000 td tags. With the only difference > being what the above tag has in it. > > So if t

Re: Html::tokeparser::simple

2003-11-26 Thread drieux
On Wednesday, Nov 26, 2003, at 12:30 US/Pacific, Paul Kraus wrote: Someone want to show me how this module can help parse out html? I want to grap text between text being able to apple regexp to get what I want. The problem is my text is among 10,000 td tags. With the only difference being what

Re: Html::tokeparser::simple

2003-11-26 Thread R. Joseph Newton
Paul Kraus wrote: > Someone want to show me how this module can help parse out html? > > I want to grap text between text being able to apple regexp to > get what I want. > > The problem is my text is among 10,000 td tags. With the only difference > being what the above tag has in it. > > So if t

Re: HTML::TokeParser

2003-02-13 Thread R. Joseph Newton
Dan Muey wrote: > That's exactly it, thanks you very much! > One more tiny little problem, > > I have it grabbing the title, links and img tags perfectly except fot one minor snafu > > It won't grab/parse img tags that are between tags, IE an image that is a link. > I tried having it parse 's fir

RE: HTML::TokeParser

2003-02-12 Thread Dan Muey
Excellent thanks! I'll look this over and give her a go! I appreciate your time and energy Dan > -Original Message- > From: david [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, February 12, 2003 3:36 PM > To: [EMAIL PROTECTED] > Subject: RE: HTML::TokeParser &g

RE: HTML::TokeParser

2003-02-12 Thread david
Dan Muey wrote: > whatever is inbetween the > I winder if it's possible to do some thing like this : > > if($token->[0] eq 'a'){ > print $token->[1]{href} || "what?","\n"; > my $link_guts = $tok->get_trimmed_text("/a"); > > and then some how grab the 'src' and 'alt' attributes from eac

RE: HTML::TokeParser

2003-02-12 Thread Dan Muey
Thanks!! That's what I figured and I was actually just formulating another question based on it :: if($token->[0] eq 'a'){ print $token->[1]{href} || "what?","\n"; # print $tok->get_trimmed_text("/a"); print "\n"; } elsif($token->[0] eq 'img') { print $token->[1]{src} || "again?","\n";

RE: HTML::TokeParser

2003-02-12 Thread david
Dan Muey wrote: > > > The script you sent me does get the imagses, mine doesn't so I screwed up > somewhere along the way. I'll take your original and modify it one step at > a time to narrow down what I did wrong. > > I'll post back when I get it right so that hopefully someone can learn > fro

RE: HTML::TokeParser

2003-02-12 Thread Dan Muey
The script you sent me does get the imagses, mine doesn't so I screwed up somewhere along the way. I'll take your original and modify it one step at a time to narrow down what I did wrong. I'll post back when I get it right so that hopefully someone can learn from my dumbness. Thanks Dan >

RE: HTML::TokeParser

2003-02-11 Thread Dan Muey
Funny, here's the script since I modified it perhaps I jacked somehting up Also I had it checking meta tags :: my $name = $token->[1]{name} || "-"; my $http = $token->[1]{http-equiv} || "-"; my $cont = $token->[1]{content} || "-"; That would grab content for all of them, The name if it was a nam

RE: HTML::TokeParser

2003-02-11 Thread david
Dan Muey wrote: > > It won't grab/parse img tags that are between tags, IE an image that > is a link. I tried having it parse 's first then 's but that > didn't work. Any thoughts?? > Thanks for all you r help! > what do you mean? the following seems to be working: #!/usr/bin/perl -w use stric

RE: HTML::TokeParser

2003-02-11 Thread Dan Muey
> Dan Muey wrote: > > > > >> > >> I am trying to use HTML::TokeParser > >> From the cpan page for this I used this example : > >> > >> while (my $token = $p->get_tag("a")) { > >> my $url = $token->[1]{href} || "-"; > >> my $text

Re: HTML::TokeParser

2003-02-11 Thread Rob Dixon
Dan Muey wrote: >> I am trying to use HTML::TokeParser >> From the cpan page for this I used this example : >> >> while (my $token = $p->get_tag("a")) { >> my $url = $token->[1]{href} || "-"; >> my $text = $p->get_trimmed_text("/a"); >

RE: HTML::TokeParser

2003-02-11 Thread david
Dan Muey wrote: > >> >> I am trying to use HTML::TokeParser >> From the cpan page for this I used this example : >> >> while (my $token = $p->get_tag("a")) { >> my $url = $token->[1]{href} || "-"; >> my $text = $p->get_trimmed_tex

RE: HTML::TokeParser

2003-02-11 Thread Dan Muey
> > I am trying to use HTML::TokeParser > From the cpan page for this I used this example : > > while (my $token = $p->get_tag("a")) { > my $url = $token->[1]{href} || "-"; > my $text = $p->get_trimmed_text("/a"); >

Re: HTML::TokeParser and  

2003-01-30 Thread David Eason
I suppose   is a browser directive, but TokeParser returns an actual character. Will research HTML::Entities' relationship to HTML::TokeParser later...probably much later, because now it's working. Dave -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTE

Re: HTML::TokeParser and  

2003-01-29 Thread Rob Dixon
"John W. Krahn" <[EMAIL PROTECTED]> wrote in message [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... > Rob Dixon wrote: > > > > "David Eason" <[EMAIL PROTECTED]> wrote in message > > [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... > > > I get a space in my editor output window. but when I run it from a c

Re: HTML::TokeParser and  

2003-01-29 Thread John W. Krahn
Rob Dixon wrote: > > "David Eason" <[EMAIL PROTECTED]> wrote in message > [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... > > I get a space in my editor output window. but when I run it from a cmd > > window, I get the other character. (This is under Windows 2000 and > perl > > 5.8.0) > > Windows co

Re: HTML::TokeParser and  

2003-01-29 Thread Rob Dixon
"David Eason" <[EMAIL PROTECTED]> wrote in message [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... > I get a space in my editor output window. but when I run it from a cmd > window, I get the other character. (This is under Windows 2000 and perl > 5.8.0) Hi David Windows command prompt doesn't supp

Re: HTML::TokeParser and  

2003-01-29 Thread David Eason
I get a space in my editor output window. but when I run it from a cmd window, I get the other character. (This is under Windows 2000 and perl 5.8.0) "John W. Krahn" <[EMAIL PROTECTED]> wrote in message [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... > David Eason wrote: > > > > John W. Krahn wrote: >

Re: HTML::TokeParser and  

2003-01-29 Thread John W. Krahn
David Eason wrote: > > John W. Krahn wrote: > > According to HTML::Entities > > > > # Some extra Latin 1 chars that are listed in the HTML3.2 draft > > (21-May-96) > > copy => '©', # copyright sign > > reg=> '®', # registered sign > > nbsp => "\240", # non breaking space > > Thanks,

Re: HTML::TokeParser and  

2003-01-29 Thread David Eason
John W. Krahn wrote: > According to HTML::Entities > > # Some extra Latin 1 chars that are listed in the HTML3.2 draft > (21-May-96) > copy => '©', # copyright sign > reg=> '®', # registered sign > nbsp => "\240", # non breaking space Thanks, John, I had no idea where to look. I did

Re: HTML::TokeParser and  

2003-01-28 Thread John W. Krahn
David Eason wrote: > > I recreated a problem in my program in a small code sample. The code below > is giving me the following output at the console and I have no idea why: > > Output: > line 1áline 3 > > I am seeing a lower case 'a' with an acute accent between 'line 1' and 'line > 3'. Any idea

Re: HTML::TokeParser/Parsing problem

2002-11-25 Thread Jeff 'japhy' Pinyan
On Nov 25, sulfericacid said: >#!/usr/local/bin/perl -w > >my %form; >my $content; > >my $userurl = $form{'userurl'}; Where do you think %form gets populated? You want to use the CGI.pm module: use CGI 'param'; my $userurl = param('userurl'); >print "Content-type: text/html\n\n"; > >use di

Re: HTML::TokeParser/Parsing problem

2002-11-25 Thread sulfericacid
Ok, late last night I realised the semi colons were missing among a few other small details. Since then I repaired all of them and ran a debbuger, but it came back clean. I uploaded and tried to run it on the webserver, but it doesn't print anything. The script doesn't display any errors or logf

Re: HTML::TokeParser/Parsing problem

2002-11-25 Thread Ovid
--- sulfericacid <[EMAIL PROTECTED]> wrote: > use HTML::TokeParser > my $p = HTML::TokeParser->new(\$content); > > my %meta; > while (my $token = $p->get_token) { > next unless $token->[1] eq 'meta' && $token->[0] eq 'S'; > $meta{$token->[2]->{name}} = $token->[2]{content}; > } >

Re: HTML::TokeParser/Parsing problem

2002-11-25 Thread Bob Rasey
On Mon, 2002-11-25 at 11:30, Jeff 'japhy' Pinyan wrote: > On Nov 24, sulfericacid said: > > >use LWP::Simple > >use HTML::TokeParser > > You're missing semi-colons after those two 'use' statement.s And while you're at it: $username =~ s/e/u/ -- To unsubscribe, e-mail: [EMAIL PROTECTED] For a

Re: HTML::TokeParser/Parsing problem

2002-11-25 Thread Jeff 'japhy' Pinyan
On Nov 24, sulfericacid said: >use LWP::Simple >use HTML::TokeParser You're missing semi-colons after those two 'use' statement.s -- Jeff "japhy" Pinyan [EMAIL PROTECTED] http://www.pobox.com/~japhy/ RPI Acacia brother #734 http://www.perlmonks.org/ http://www.cpan.org/ what does

Re: HTML::TokeParser

2002-02-24 Thread Peter Scott
At 11:43 AM 2/25/02 +1030, Daniel Falkenberg wrote: >Hello All, > >Is it possible for HTML::TokeParser to be able to work like >HTML::TreeBuilder. I don't want to have to download the webpage I just >want to be able to view its' HTML content Okay, I give up. How do you envisage being able to se