subject:"html\:\:tokeparser"

Re: HTML::TokeParser munging characters

2013-12-28 Thread Lars Noodén

On 12/28/2013 05:52 PM, Shawn Wilson wrote: > The parser has done what its supposed to. IDK you can alter the > encoding in it. Maybe you can and that's what you're looking for > (encoding or character set). I'd first try binmode UTF-8 but you'll > probably just end up handling this with a regex.

Re: HTML::TokeParser munging characters

2013-12-28 Thread Shawn Wilson

te: >If there is a better list for discussing HTML::TokeParser, I can post >there. I have a code snippet which successfully extracts a piece of a >web page. However, something goes south with the conversion to text. >What should come out as the following > > Temperature 3.

HTML::TokeParser munging characters

2013-12-28 Thread Lars Noodén

If there is a better list for discussing HTML::TokeParser, I can post there. I have a code snippet which successfully extracts a piece of a web page. However, something goes south with the conversion to text. What should come out as the following Temperature 3.2°C Humidity 94

Re: html::tokeparser

2010-12-07 Thread Rob Dixon

element of the table that looks like this: Length: 266.0m i'm fine getting the url but i'm getting this four times when i try to run the script: Use of uninitialized value in string eq at ./vt-getlen.pl line 36, line 2. and the tokeparser looks like this: my $parser = HT

Re: html::tokeparser

2010-12-07 Thread Jim Gibson

ing for an element of the table that looks like this: > > Length: > 266.0m > > > i'm fine getting the url but i'm getting this four times when i try to run > the script: > Use of uninitialized value in string eq at ./vt-getlen.pl line 36, > line 2. > &

html::tokeparser

2010-12-07 Thread shawn wilson

ting the url but i'm getting this four times when i try to run the script: Use of uninitialized value in string eq at ./vt-getlen.pl line 36, line 2. and the tokeparser looks like this: my $parser = HTML::TokeParser::Simple->new( string => $content ) or die "Can't de

Re: HTML::TokeParser question

2006-12-20 Thread Rob Dixon

Mathew Snyder wrote: > > I have a script which runs WWW::Mechanize to obtain a page so it can be parsed > for email addresses. However, I can't recall how I'm supposed to use > HTML::TokeParser to get what I need. This is the pertinent part of the > script: > &

Re: HTML::TokeParser question

2006-12-20 Thread Rob Dixon

Mumia W. wrote: > On 12/19/2006 10:58 PM, Mathew Snyder wrote: >> I have a script which runs WWW::Mechanize to obtain a page so it can be >> parsed for email addresses. However, I can't recall how I'm supposed to use >> HTML::TokeParser to get what I need. Th

Re: HTML::TokeParser question

2006-12-20 Thread Mumia W.

On 12/19/2006 10:58 PM, Mathew Snyder wrote: I have a script which runs WWW::Mechanize to obtain a page so it can be parsed for email addresses. However, I can't recall how I'm supposed to use HTML::TokeParser to get what I need. This is the pertinent part of the script: ..

HTML::TokeParser question

2006-12-19 Thread Mathew Snyder

I have a script which runs WWW::Mechanize to obtain a page so it can be parsed for email addresses. However, I can't recall how I'm supposed to use HTML::TokeParser to get what I need. This is the pertinent part of the script: ... my $data = $agent->content(); my $parse

Re: HTML::TokeParser, get HTML

2005-11-22 Thread Ing. Branislav Gerzo

Ing. Branislav Gerzo [IBG], on Tuesday, November 22, 2005 at 13:42 (+0100) thinks about: IBG> while(my $tag = $parser->get_tag('b')) { IBG> my $text = $parser->get_text(); IBG> last if $text =~ /^(this and that|or that and this)/i; IBG> } IBG> my $text = $parser->get_text('b', 'b

HTML::TokeParser, get HTML

2005-11-22 Thread Ing. Branislav Gerzo

Hello all, I'm using this great module for parsing HTML files. But I run into trouble - I need get clear unchanged HTML code. Common example of using this module is (snippet): while(my $tag = $parser->get_tag('b')) { my $text = $parser->get_text(); last if $text =~ /^(this and t

RE: HTML::TokeParser::Simple / get_attr

2004-11-24 Thread Brian Volk

Charles, Thank you! It's working! I went w/ the... foreach approach... my $parser = HTML::TokeParser::Simple->new(\$page) || die "Could not parse page"; my ($tag, $attr); $tag = $parser->get_tag("table") foreach (1..10); $parser->get_tag("

RE: HTML::TokeParser::Simple / get_attr

2004-11-24 Thread Charles K. Clarkson

trict. use strict; : use HTML::TokeParser::Simple; [snip] : my $parser = HTML::TokeParser->new(\$page) || : die "Could not parse page"; Wrong object. my $parser = HTML::TokeParser::Simple->new( \$page ) || die &quo

HTML::TokeParser::Simple / get_attr

2004-11-24 Thread Brian Volk

Hi All, I'm trying to only get the text from w/in a certain table in the HTML source. Right now I am getting all the text in the source. Here is my script I made notes in the script. #!/usr/bin/perl -w use HTML::TokeParser::Simple; use LWP::Simple; my $url = &

Re: Html::tokeparser::simple

2003-11-28 Thread R. Joseph Newton

ied on your own. Anyway, I hope this makes up for my negligence a bit. I'm not sure that HTML::TokeParser::Simple adds anything to the functionality of HTML::TokeParser for your purposes [at least what you have described here]. The Simple part mostly has to do with making the tag types and

Re: Html::tokeparser::simple

2003-11-26 Thread drieux

On Wednesday, Nov 26, 2003, at 12:30 US/Pacific, Paul Kraus wrote: Someone want to show me how this module can help parse out html? I want to grap text between text being able to apple regexp to get what I want. The problem is my text is among 10,000 td tags. With the only difference being what

Re: Html::tokeparser::simple

2003-11-26 Thread R. Joseph Newton

has in it. > > So if th tag = then store text between into an array. > > Paul Have you looked into HTML::TokeParser? Might be a good place to start. Joseph -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Html::tokeparser::simple

2003-11-26 Thread Paul Kraus

Someone want to show me how this module can help parse out html? I want to grap text between text being able to apple regexp to get what I want. The problem is my text is among 10,000 td tags. With the only difference being what the above tag has in it. So if th tag = then store text between i

Re: HTML::TokeParser

2003-02-13 Thread R. Joseph Newton

Dan Muey wrote: > That's exactly it, thanks you very much! > One more tiny little problem, > > I have it grabbing the title, links and img tags perfectly except fot one minor snafu > > It won't grab/parse img tags that are between tags, IE an image that is a link. > I tried having it parse 's fir

RE: HTML::TokeParser

2003-02-12 Thread Dan Muey

Excellent thanks! I'll look this over and give her a go! I appreciate your time and energy Dan > -Original Message- > From: david [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, February 12, 2003 3:36 PM > To: [EMAIL PROTECTED] > Subject: RE: HTML::TokeParser &g

RE: HTML::TokeParser

2003-02-12 Thread david

ML is on the way to rescue. is you use get_token() instead of get_tag(), it might be easier. get_token() return for all token and it will be the programmer's responsibility to use the token. get_tag() eats up the tokens you don't want so it's tricky: #!/usr/bin/perl -w use s

RE: HTML::TokeParser

2003-02-12 Thread Dan Muey

print $token->[1]{href} || "what?","\n"; my $link_guts = $tok->get_trimmed_text("/a"); and then some how grab the 'src' and 'alt' attributes from each img tag in $link_guts if it's an image and the regular text if it's not and pr

RE: HTML::TokeParser

2003-02-12 Thread david

Dan Muey wrote: > > > The script you sent me does get the imagses, mine doesn't so I screwed up > somewhere along the way. I'll take your original and modify it one step at > a time to narrow down what I did wrong. > > I'll post back when I get it right so that hopefully someone can learn > fro

RE: HTML::TokeParser

2003-02-12 Thread Dan Muey

ken->[1]{content} || "-"; > > That would grab content for all of them, > The name if it was a name and if it was an http-equiv it > would make come out as - > > Not sure what I'm doing wrong but any way here's my script :: Thanks > > #!/usr/bin/perl &

RE: HTML::TokeParser

2003-02-11 Thread Dan Muey

ontent for all of them, The name if it was a name and if it was an http-equiv it would make come out as - Not sure what I'm doing wrong but any way here's my script :: Thanks #!/usr/bin/perl use LWP::Simple; use HTML::TokeParser; $url = $ARGV[0]; $content = get($url);

RE: HTML::TokeParser

2003-02-11 Thread david

to be working: #!/usr/bin/perl -w use strict; use HTML::TokeParser; my $tok = new HTML::TokeParser(*DATA) || die $!; while(1){ my $token = $tok->get_tag("a","img"); last unless($token); if($token->[0] eq 'a'){ print $token-&

RE: HTML::TokeParser

2003-02-11 Thread Dan Muey

> Dan Muey wrote: > > > > >> > >> I am trying to use HTML::TokeParser > >> From the cpan page for this I used this example : > >> > >> while (my $token = $p->get_tag("a")) { > >>

Re: HTML::TokeParser

2003-02-11 Thread Rob Dixon

Dan Muey wrote: >> I am trying to use HTML::TokeParser >> From the cpan page for this I used this example : >> >> while (my $token = $p->get_tag("a")) { >> my $url = $token->[1]{href} || "-"; >&g

RE: HTML::TokeParser

2003-02-11 Thread david

Dan Muey wrote: > >> >> I am trying to use HTML::TokeParser >> From the cpan page for this I used this example : >> >> while (my $token = $p->get_tag("a")) { >> my $url = $token->[1]

RE: HTML::TokeParser

2003-02-11 Thread Dan Muey

> > I am trying to use HTML::TokeParser > From the cpan page for this I used this example : > > while (my $token = $p->get_tag("a")) { > my $url = $token->[1]{href} || "-"; >

HTML::TokeParser

2003-02-11 Thread Dan Muey

I am trying to use HTML::TokeParser >From the cpan page for this I used this example : while (my $token = $p->get_tag("a")) { my $url = $token->[1]{href} || "-"; my $text = $p->get_trimmed_text(&quo

Re: HTML::TokeParser and

2003-01-30 Thread David Eason

I suppose is a browser directive, but TokeParser returns an actual character. Will research HTML::Entities' relationship to HTML::TokeParser later...probably much later, because now it's working. Dave -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-ma

Re: HTML::TokeParser and

2003-01-29 Thread Rob Dixon

"John W. Krahn" <[EMAIL PROTECTED]> wrote in message [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... > Rob Dixon wrote: > > > > "David Eason" <[EMAIL PROTECTED]> wrote in message > > [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... > > > I get a space in my editor output window. but when I run it from a c

Re: HTML::TokeParser and

2003-01-29 Thread John W. Krahn

Rob Dixon wrote: > > "David Eason" <[EMAIL PROTECTED]> wrote in message > [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... > > I get a space in my editor output window. but when I run it from a cmd > > window, I get the other character. (This is under Windows 2000 and > perl > > 5.8.0) > > Windows co

Re: HTML::TokeParser and

2003-01-29 Thread Rob Dixon

"David Eason" <[EMAIL PROTECTED]> wrote in message [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... > I get a space in my editor output window. but when I run it from a cmd > window, I get the other character. (This is under Windows 2000 and perl > 5.8.0) Hi David Windows command prompt doesn't supp

Re: HTML::TokeParser and

2003-01-29 Thread David Eason

> AFAIK it is. > > > I have corrected the code below accordingly and it prints "line > > 1line 3" as desired. > > FWIW on my computer "\240" prints a "space". :-) > > > use strict; > > use warnings; > > use HTML::TokeParse

Re: HTML::TokeParser and

2003-01-29 Thread John W. Krahn

de below accordingly and it prints "line > 1line 3" as desired. FWIW on my computer "\240" prints a "space". :-) > use strict; > use warnings; > use HTML::TokeParser; > > my $p = HTML::TokeParser->new(*DATA) or die "Can't open: $!&quo

Re: HTML::TokeParser and

2003-01-29 Thread David Eason

breaking space Thanks, John, I had no idea where to look. I didn't know a non-breaking space was an actual character, I thought it was just a directive to the browser. I have corrected the code below accordingly and it prints "line 1line 3" as desired. use strict; use warnings; use

Re: HTML::TokeParser and

2003-01-28 Thread John W. Krahn

tween 'line 1' and 'line > 3'. Any idea what is going on? According to HTML::Entities # Some extra Latin 1 chars that are listed in the HTML3.2 draft (21-May-96) copy => '©', # copyright sign reg=> '®', # registered sign nbsp => "

HTML::TokeParser and

2003-01-28 Thread David Eason

is going on? use strict; use warnings; use HTML::TokeParser; my $p = HTML::TokeParser->new(*DATA) or die "Can't open: $!"; while (my $tag = $p->get_tag()) { print $p->get_trimmed_text() if ($tag->[0] eq "dd") } __END__ __DATA__ line 1 line 3 -- To uns

Re: HTML::TokeParser/Parsing problem

2002-11-25 Thread Jeff 'japhy' Pinyan

On Nov 25, sulfericacid said: >#!/usr/local/bin/perl -w > >my %form; >my $content; > >my $userurl = $form{'userurl'}; Where do you think %form gets populated? You want to use the CGI.pm module: use CGI 'param'; my $userurl = param('userurl'); >print "Content-type: text/html\n\n"; > >use di

Re: HTML::TokeParser/Parsing problem

2002-11-25 Thread sulfericacid

errors or logfiles, but it prints a blank screen. Anyone see why? Thanks. sulfericacid #!/usr/local/bin/perl -w my %form; my $content; my $userurl = $form{'userurl'}; print "Content-type: text/html\n\n"; use diagnostics; use strict; use LWP::Simple; $content = get($userurl)

Re: HTML::TokeParser/Parsing problem

2002-11-25 Thread Ovid

--- sulfericacid <[EMAIL PROTECTED]> wrote: > use HTML::TokeParser > my $p = HTML::TokeParser->new(\$content); > > my %meta; > while (my $token = $p->get_token) { > next unless $token->[1] eq 'meta' && $token->[0] eq 'S&#

Re: HTML::TokeParser/Parsing problem

2002-11-25 Thread Bob Rasey

On Mon, 2002-11-25 at 11:30, Jeff 'japhy' Pinyan wrote: > On Nov 24, sulfericacid said: > > >use LWP::Simple > >use HTML::TokeParser > > You're missing semi-colons after those two 'use' statement.s And while you're at it: $username =~ s/e/

Re: HTML::TokeParser/Parsing problem

2002-11-25 Thread Jeff 'japhy' Pinyan

On Nov 24, sulfericacid said: >use LWP::Simple >use HTML::TokeParser You're missing semi-colons after those two 'use' statement.s -- Jeff "japhy" Pinyan [EMAIL PROTECTED] http://www.pobox.com/~japhy/ RPI Acacia brother #734 http://www.perlmonks.or

HTML::TokeParser/Parsing problem

2002-11-25 Thread sulfericacid

tent = get($form{'userurl'}); use HTML::TokeParser my $p = HTML::TokeParser->new(\$content); my %meta; while (my $token = $p->get_token) { next unless $token->[1] eq 'meta' && $token->[0] eq 'S'; $meta{$token->[2]->{

Re: Still can't extract data using HTML::TokeParser

2002-02-25 Thread Chris Ball

On Mon, Feb 25, 2002 at 02:29:58PM +1030, Daniel Falkenberg wrote: > my $content = $response->content; > $p = HTML::TokeParser->new($content) || die "Can't open: $!"; > while ($stream->get_tag("h1")) { $data = get_trimmed_text("/h1");} To sta

Still can't extract data using HTML::TokeParser

2002-02-24 Thread Daniel Falkenberg

Hey all, Just wondering why I still can't get HTML::TokeParser to either download that page I am looking for or at least store the HTML from the requested page. I know I could quite easily do this if I used HTML::Tableextract except the data I want is only about 3 lines of HTML and there a

Re: HTML::TokeParser

2002-02-24 Thread Peter Scott

At 11:43 AM 2/25/02 +1030, Daniel Falkenberg wrote: >Hello All, > >Is it possible for HTML::TokeParser to be able to work like >HTML::TreeBuilder. I don't want to have to download the webpage I just >want to be able to view its' HTML content Okay, I give up. How do y

HTML::TokeParser

2002-02-24 Thread Daniel Falkenberg

Hello All, Is it possible for HTML::TokeParser to be able to work like HTML::TreeBuilder. I don't want to have to download the webpage I just want to be able to view its' HTML content and then extract the data from there. If this isn't possible what would be my best bet for

51 matches

Mail list logo