Re: HTML::TokeParser munging characters

2013-12-28 Thread Lars Noodén
On 12/28/2013 05:52 PM, Shawn Wilson wrote: > The parser has done what its supposed to. IDK you can alter the > encoding in it. Maybe you can and that's what you're looking for > (encoding or character set). I'd first try binmode UTF-8 but you'll > probably just end up handling this with a regex.

Re: HTML::TokeParser munging characters

2013-12-28 Thread Shawn Wilson
te: >If there is a better list for discussing HTML::TokeParser, I can post >there. I have a code snippet which successfully extracts a piece of a >web page. However, something goes south with the conversion to text. >What should come out as the following > > Temperature 3.

HTML::TokeParser munging characters

2013-12-28 Thread Lars Noodén
If there is a better list for discussing HTML::TokeParser, I can post there. I have a code snippet which successfully extracts a piece of a web page. However, something goes south with the conversion to text. What should come out as the following Temperature 3.2°C Humidity 94

Re: html::tokeparser

2010-12-07 Thread Rob Dixon
element of the table that looks like this: Length: 266.0m i'm fine getting the url but i'm getting this four times when i try to run the script: Use of uninitialized value in string eq at ./vt-getlen.pl line 36, line 2. and the tokeparser looks like this: my $parser = HT

Re: html::tokeparser

2010-12-07 Thread Jim Gibson
ing for an element of the table that looks like this: > > Length: > 266.0m > > > i'm fine getting the url but i'm getting this four times when i try to run > the script: > Use of uninitialized value in string eq at ./vt-getlen.pl line 36, > line 2. > &

html::tokeparser

2010-12-07 Thread shawn wilson
ting the url but i'm getting this four times when i try to run the script: Use of uninitialized value in string eq at ./vt-getlen.pl line 36, line 2. and the tokeparser looks like this: my $parser = HTML::TokeParser::Simple->new( string => $content ) or die "Can't de

Re: HTML::TokeParser question

2006-12-20 Thread Rob Dixon
Mathew Snyder wrote: > > I have a script which runs WWW::Mechanize to obtain a page so it can be parsed > for email addresses. However, I can't recall how I'm supposed to use > HTML::TokeParser to get what I need. This is the pertinent part of the > script: > &

Re: HTML::TokeParser question

2006-12-20 Thread Rob Dixon
Mumia W. wrote: > On 12/19/2006 10:58 PM, Mathew Snyder wrote: >> I have a script which runs WWW::Mechanize to obtain a page so it can be >> parsed for email addresses. However, I can't recall how I'm supposed to use >> HTML::TokeParser to get what I need. Th

Re: HTML::TokeParser question

2006-12-20 Thread Mumia W.
On 12/19/2006 10:58 PM, Mathew Snyder wrote: I have a script which runs WWW::Mechanize to obtain a page so it can be parsed for email addresses. However, I can't recall how I'm supposed to use HTML::TokeParser to get what I need. This is the pertinent part of the script: ..

HTML::TokeParser question

2006-12-19 Thread Mathew Snyder
I have a script which runs WWW::Mechanize to obtain a page so it can be parsed for email addresses. However, I can't recall how I'm supposed to use HTML::TokeParser to get what I need. This is the pertinent part of the script: ... my $data = $agent->content(); my $parse

Re: HTML::TokeParser, get HTML

2005-11-22 Thread Ing. Branislav Gerzo
Ing. Branislav Gerzo [IBG], on Tuesday, November 22, 2005 at 13:42 (+0100) thinks about: IBG> while(my $tag = $parser->get_tag('b')) { IBG> my $text = $parser->get_text(); IBG> last if $text =~ /^(this and that|or that and this)/i; IBG> } IBG> my $text = $parser->get_text('b', 'b

HTML::TokeParser, get HTML

2005-11-22 Thread Ing. Branislav Gerzo
Hello all, I'm using this great module for parsing HTML files. But I run into trouble - I need get clear unchanged HTML code. Common example of using this module is (snippet): while(my $tag = $parser->get_tag('b')) { my $text = $parser->get_text(); last if $text =~ /^(this and t

RE: HTML::TokeParser::Simple / get_attr

2004-11-24 Thread Brian Volk
Charles, Thank you! It's working! I went w/ the... foreach approach... my $parser = HTML::TokeParser::Simple->new(\$page) || die "Could not parse page"; my ($tag, $attr); $tag = $parser->get_tag("table") foreach (1..10); $parser->get_tag("

RE: HTML::TokeParser::Simple / get_attr

2004-11-24 Thread Charles K. Clarkson
trict. use strict; : use HTML::TokeParser::Simple; [snip] : my $parser = HTML::TokeParser->new(\$page) || : die "Could not parse page"; Wrong object. my $parser = HTML::TokeParser::Simple->new( \$page ) || die &quo

HTML::TokeParser::Simple / get_attr

2004-11-24 Thread Brian Volk
Hi All, I'm trying to only get the text from w/in a certain table in the HTML source. Right now I am getting all the text in the source. Here is my script I made notes in the script. #!/usr/bin/perl -w use HTML::TokeParser::Simple; use LWP::Simple; my $url = &

Re: Html::tokeparser::simple

2003-11-28 Thread R. Joseph Newton
ied on your own. Anyway, I hope this makes up for my negligence a bit. I'm not sure that HTML::TokeParser::Simple adds anything to the functionality of HTML::TokeParser for your purposes [at least what you have described here]. The Simple part mostly has to do with making the tag types and

Re: Html::tokeparser::simple

2003-11-26 Thread drieux
On Wednesday, Nov 26, 2003, at 12:30 US/Pacific, Paul Kraus wrote: Someone want to show me how this module can help parse out html? I want to grap text between text being able to apple regexp to get what I want. The problem is my text is among 10,000 td tags. With the only difference being what

Re: Html::tokeparser::simple

2003-11-26 Thread R. Joseph Newton
has in it. > > So if th tag = then store text between into an array. > > Paul Have you looked into HTML::TokeParser? Might be a good place to start. Joseph -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Html::tokeparser::simple

2003-11-26 Thread Paul Kraus
Someone want to show me how this module can help parse out html? I want to grap text between text being able to apple regexp to get what I want. The problem is my text is among 10,000 td tags. With the only difference being what the above tag has in it. So if th tag = then store text between i

Re: HTML::TokeParser

2003-02-13 Thread R. Joseph Newton
Dan Muey wrote: > That's exactly it, thanks you very much! > One more tiny little problem, > > I have it grabbing the title, links and img tags perfectly except fot one minor snafu > > It won't grab/parse img tags that are between tags, IE an image that is a link. > I tried having it parse 's fir

RE: HTML::TokeParser

2003-02-12 Thread Dan Muey
Excellent thanks! I'll look this over and give her a go! I appreciate your time and energy Dan > -Original Message- > From: david [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, February 12, 2003 3:36 PM > To: [EMAIL PROTECTED] > Subject: RE: HTML::TokeParser &g

RE: HTML::TokeParser

2003-02-12 Thread david
ML is on the way to rescue. is you use get_token() instead of get_tag(), it might be easier. get_token() return for all token and it will be the programmer's responsibility to use the token. get_tag() eats up the tokens you don't want so it's tricky: #!/usr/bin/perl -w use s

RE: HTML::TokeParser

2003-02-12 Thread Dan Muey
print $token->[1]{href} || "what?","\n"; my $link_guts = $tok->get_trimmed_text("/a"); and then some how grab the 'src' and 'alt' attributes from each img tag in $link_guts if it's an image and the regular text if it's not and pr

RE: HTML::TokeParser

2003-02-12 Thread david
Dan Muey wrote: > > > The script you sent me does get the imagses, mine doesn't so I screwed up > somewhere along the way. I'll take your original and modify it one step at > a time to narrow down what I did wrong. > > I'll post back when I get it right so that hopefully someone can learn > fro

RE: HTML::TokeParser

2003-02-12 Thread Dan Muey
ken->[1]{content} || "-"; > > That would grab content for all of them, > The name if it was a name and if it was an http-equiv it > would make come out as - > > Not sure what I'm doing wrong but any way here's my script :: Thanks > > #!/usr/bin/perl &

RE: HTML::TokeParser

2003-02-11 Thread Dan Muey
ontent for all of them, The name if it was a name and if it was an http-equiv it would make come out as - Not sure what I'm doing wrong but any way here's my script :: Thanks #!/usr/bin/perl use LWP::Simple; use HTML::TokeParser; $url = $ARGV[0]; $content = get($url);

RE: HTML::TokeParser

2003-02-11 Thread david
to be working: #!/usr/bin/perl -w use strict; use HTML::TokeParser; my $tok = new HTML::TokeParser(*DATA) || die $!; while(1){ my $token = $tok->get_tag("a","img"); last unless($token); if($token->[0] eq 'a'){ print $token-&

RE: HTML::TokeParser

2003-02-11 Thread Dan Muey
> Dan Muey wrote: > > > > >> > >> I am trying to use HTML::TokeParser > >> From the cpan page for this I used this example : > >> > >> while (my $token = $p->get_tag("a")) { > >>

Re: HTML::TokeParser

2003-02-11 Thread Rob Dixon
Dan Muey wrote: >> I am trying to use HTML::TokeParser >> From the cpan page for this I used this example : >> >> while (my $token = $p->get_tag("a")) { >> my $url = $token->[1]{href} || "-"; >&g

RE: HTML::TokeParser

2003-02-11 Thread david
Dan Muey wrote: > >> >> I am trying to use HTML::TokeParser >> From the cpan page for this I used this example : >> >> while (my $token = $p->get_tag("a")) { >> my $url = $token->[1]

RE: HTML::TokeParser

2003-02-11 Thread Dan Muey
> > I am trying to use HTML::TokeParser > From the cpan page for this I used this example : > > while (my $token = $p->get_tag("a")) { > my $url = $token->[1]{href} || "-"; >

HTML::TokeParser

2003-02-11 Thread Dan Muey
I am trying to use HTML::TokeParser >From the cpan page for this I used this example : while (my $token = $p->get_tag("a")) { my $url = $token->[1]{href} || "-"; my $text = $p->get_trimmed_text(&quo

Re: HTML::TokeParser and  

2003-01-30 Thread David Eason
I suppose   is a browser directive, but TokeParser returns an actual character. Will research HTML::Entities' relationship to HTML::TokeParser later...probably much later, because now it's working. Dave -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-ma

Re: HTML::TokeParser and  

2003-01-29 Thread Rob Dixon
"John W. Krahn" <[EMAIL PROTECTED]> wrote in message [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... > Rob Dixon wrote: > > > > "David Eason" <[EMAIL PROTECTED]> wrote in message > > [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... > > > I get a space in my editor output window. but when I run it from a c

Re: HTML::TokeParser and  

2003-01-29 Thread John W. Krahn
Rob Dixon wrote: > > "David Eason" <[EMAIL PROTECTED]> wrote in message > [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... > > I get a space in my editor output window. but when I run it from a cmd > > window, I get the other character. (This is under Windows 2000 and > perl > > 5.8.0) > > Windows co

Re: HTML::TokeParser and  

2003-01-29 Thread Rob Dixon
"David Eason" <[EMAIL PROTECTED]> wrote in message [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... > I get a space in my editor output window. but when I run it from a cmd > window, I get the other character. (This is under Windows 2000 and perl > 5.8.0) Hi David Windows command prompt doesn't supp

Re: HTML::TokeParser and  

2003-01-29 Thread David Eason
> AFAIK it is. > > > I have corrected the code below accordingly and it prints "line > > 1line 3" as desired. > > FWIW on my computer "\240" prints a "space". :-) > > > use strict; > > use warnings; > > use HTML::TokeParse

Re: HTML::TokeParser and  

2003-01-29 Thread John W. Krahn
de below accordingly and it prints "line > 1line 3" as desired. FWIW on my computer "\240" prints a "space". :-) > use strict; > use warnings; > use HTML::TokeParser; > > my $p = HTML::TokeParser->new(*DATA) or die "Can't open: $!&quo

Re: HTML::TokeParser and  

2003-01-29 Thread David Eason
breaking space Thanks, John, I had no idea where to look. I didn't know a non-breaking space was an actual character, I thought it was just a directive to the browser. I have corrected the code below accordingly and it prints "line 1line 3" as desired. use strict; use warnings; use

Re: HTML::TokeParser and  

2003-01-28 Thread John W. Krahn
tween 'line 1' and 'line > 3'. Any idea what is going on? According to HTML::Entities # Some extra Latin 1 chars that are listed in the HTML3.2 draft (21-May-96) copy => '©', # copyright sign reg=> '®', # registered sign nbsp => "

HTML::TokeParser and  

2003-01-28 Thread David Eason
is going on? use strict; use warnings; use HTML::TokeParser; my $p = HTML::TokeParser->new(*DATA) or die "Can't open: $!"; while (my $tag = $p->get_tag()) { print $p->get_trimmed_text() if ($tag->[0] eq "dd") } __END__ __DATA__ line 1   line 3 -- To uns

Re: HTML::TokeParser/Parsing problem

2002-11-25 Thread Jeff 'japhy' Pinyan
On Nov 25, sulfericacid said: >#!/usr/local/bin/perl -w > >my %form; >my $content; > >my $userurl = $form{'userurl'}; Where do you think %form gets populated? You want to use the CGI.pm module: use CGI 'param'; my $userurl = param('userurl'); >print "Content-type: text/html\n\n"; > >use di

Re: HTML::TokeParser/Parsing problem

2002-11-25 Thread sulfericacid
errors or logfiles, but it prints a blank screen. Anyone see why? Thanks. sulfericacid #!/usr/local/bin/perl -w my %form; my $content; my $userurl = $form{'userurl'}; print "Content-type: text/html\n\n"; use diagnostics; use strict; use LWP::Simple; $content = get($userurl)

Re: HTML::TokeParser/Parsing problem

2002-11-25 Thread Ovid
--- sulfericacid <[EMAIL PROTECTED]> wrote: > use HTML::TokeParser > my $p = HTML::TokeParser->new(\$content); > > my %meta; > while (my $token = $p->get_token) { > next unless $token->[1] eq 'meta' && $token->[0] eq 'S&#

Re: HTML::TokeParser/Parsing problem

2002-11-25 Thread Bob Rasey
On Mon, 2002-11-25 at 11:30, Jeff 'japhy' Pinyan wrote: > On Nov 24, sulfericacid said: > > >use LWP::Simple > >use HTML::TokeParser > > You're missing semi-colons after those two 'use' statement.s And while you're at it: $username =~ s/e/

Re: HTML::TokeParser/Parsing problem

2002-11-25 Thread Jeff 'japhy' Pinyan
On Nov 24, sulfericacid said: >use LWP::Simple >use HTML::TokeParser You're missing semi-colons after those two 'use' statement.s -- Jeff "japhy" Pinyan [EMAIL PROTECTED] http://www.pobox.com/~japhy/ RPI Acacia brother #734 http://www.perlmonks.or

HTML::TokeParser/Parsing problem

2002-11-25 Thread sulfericacid
tent = get($form{'userurl'}); use HTML::TokeParser my $p = HTML::TokeParser->new(\$content); my %meta; while (my $token = $p->get_token) { next unless $token->[1] eq 'meta' && $token->[0] eq 'S'; $meta{$token->[2]->{

Re: Still can't extract data using HTML::TokeParser

2002-02-25 Thread Chris Ball
On Mon, Feb 25, 2002 at 02:29:58PM +1030, Daniel Falkenberg wrote: > my $content = $response->content; > $p = HTML::TokeParser->new($content) || die "Can't open: $!"; > while ($stream->get_tag("h1")) { $data = get_trimmed_text("/h1");} To sta

Still can't extract data using HTML::TokeParser

2002-02-24 Thread Daniel Falkenberg
Hey all, Just wondering why I still can't get HTML::TokeParser to either download that page I am looking for or at least store the HTML from the requested page. I know I could quite easily do this if I used HTML::Tableextract except the data I want is only about 3 lines of HTML and there a

Re: HTML::TokeParser

2002-02-24 Thread Peter Scott
At 11:43 AM 2/25/02 +1030, Daniel Falkenberg wrote: >Hello All, > >Is it possible for HTML::TokeParser to be able to work like >HTML::TreeBuilder. I don't want to have to download the webpage I just >want to be able to view its' HTML content Okay, I give up. How do y

HTML::TokeParser

2002-02-24 Thread Daniel Falkenberg
Hello All, Is it possible for HTML::TokeParser to be able to work like HTML::TreeBuilder. I don't want to have to download the webpage I just want to be able to view its' HTML content and then extract the data from there. If this isn't possible what would be my best bet for