On 12/28/2013 05:52 PM, Shawn Wilson wrote:
> The parser has done what its supposed to. IDK you can alter the
> encoding in it. Maybe you can and that's what you're looking for
> (encoding or character set). I'd first try binmode UTF-8 but you'll
> probably just end up handling this with a regex.
te:
>If there is a better list for discussing HTML::TokeParser, I can post
>there. I have a code snippet which successfully extracts a piece of a
>web page. However, something goes south with the conversion to text.
>What should come out as the following
>
> Temperature 3.
If there is a better list for discussing HTML::TokeParser, I can post
there. I have a code snippet which successfully extracts a piece of a
web page. However, something goes south with the conversion to text.
What should come out as the following
Temperature 3.2°C
Humidity 94
element of the table that looks like this:
Length:
266.0m
i'm fine getting the url but i'm getting this four times when i try to run
the script:
Use of uninitialized value in string eq at ./vt-getlen.pl line 36,
line 2.
and the tokeparser looks like this:
my $parser = HT
ing for an element of the table that looks like this:
>
> Length:
> 266.0m
>
>
> i'm fine getting the url but i'm getting this four times when i try to run
> the script:
> Use of uninitialized value in string eq at ./vt-getlen.pl line 36,
> line 2.
>
&
ting the url but i'm getting this four times when i try to run
the script:
Use of uninitialized value in string eq at ./vt-getlen.pl line 36,
line 2.
and the tokeparser looks like this:
my $parser = HTML::TokeParser::Simple->new( string => $content )
or die "Can't de
Mathew Snyder wrote:
>
> I have a script which runs WWW::Mechanize to obtain a page so it can be parsed
> for email addresses. However, I can't recall how I'm supposed to use
> HTML::TokeParser to get what I need. This is the pertinent part of the
> script:
>
&
Mumia W. wrote:
> On 12/19/2006 10:58 PM, Mathew Snyder wrote:
>> I have a script which runs WWW::Mechanize to obtain a page so it can be
>> parsed for email addresses. However, I can't recall how I'm supposed to use
>> HTML::TokeParser to get what I need. Th
On 12/19/2006 10:58 PM, Mathew Snyder wrote:
I have a script which runs WWW::Mechanize to obtain a page so it can be parsed
for email addresses. However, I can't recall how I'm supposed to use
HTML::TokeParser to get what I need. This is the pertinent part of the script:
..
I have a script which runs WWW::Mechanize to obtain a page so it can be parsed
for email addresses. However, I can't recall how I'm supposed to use
HTML::TokeParser to get what I need. This is the pertinent part of the script:
...
my $data = $agent->content();
my $parse
Ing. Branislav Gerzo [IBG], on Tuesday, November 22, 2005 at 13:42
(+0100) thinks about:
IBG> while(my $tag = $parser->get_tag('b')) {
IBG> my $text = $parser->get_text();
IBG> last if $text =~ /^(this and that|or that and this)/i;
IBG> }
IBG> my $text = $parser->get_text('b', 'b
Hello all,
I'm using this great module for parsing HTML files. But I run into
trouble - I need get clear unchanged HTML code. Common example of
using this module is (snippet):
while(my $tag = $parser->get_tag('b')) {
my $text = $parser->get_text();
last if $text =~ /^(this and t
Charles,
Thank you! It's working! I went w/ the... foreach approach...
my $parser = HTML::TokeParser::Simple->new(\$page) ||
die "Could not parse page";
my ($tag, $attr);
$tag = $parser->get_tag("table") foreach (1..10);
$parser->get_tag("
trict.
use strict;
: use HTML::TokeParser::Simple;
[snip]
: my $parser = HTML::TokeParser->new(\$page) ||
: die "Could not parse page";
Wrong object.
my $parser = HTML::TokeParser::Simple->new( \$page ) ||
die &quo
Hi All,
I'm trying to only get the text from w/in a certain table in the HTML
source. Right now I am getting all the text in the source. Here is my
script I made notes in the script.
#!/usr/bin/perl -w
use HTML::TokeParser::Simple;
use LWP::Simple;
my $url = &
ied on your own. Anyway, I hope this
makes up for my negligence a bit.
I'm not sure that HTML::TokeParser::Simple adds anything to the
functionality of HTML::TokeParser for your purposes [at least what you have
described here]. The Simple part mostly has to do with making the tag types
and
On Wednesday, Nov 26, 2003, at 12:30 US/Pacific, Paul Kraus wrote:
Someone want to show me how this module can help parse out html?
I want to grap text between text being able to apple regexp to
get what I want.
The problem is my text is among 10,000 td tags. With the only
difference
being what
has in it.
>
> So if th tag = then store text between into an array.
>
> Paul
Have you looked into HTML::TokeParser? Might be a good place to start.
Joseph
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Someone want to show me how this module can help parse out html?
I want to grap text between text being able to apple regexp to
get what I want.
The problem is my text is among 10,000 td tags. With the only difference
being what the above tag has in it.
So if th tag = then store text between i
Dan Muey wrote:
> That's exactly it, thanks you very much!
> One more tiny little problem,
>
> I have it grabbing the title, links and img tags perfectly except fot one minor snafu
>
> It won't grab/parse img tags that are between tags, IE an image that is a link.
> I tried having it parse 's fir
Excellent thanks! I'll look this over and give her a go!
I appreciate your time and energy
Dan
> -Original Message-
> From: david [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, February 12, 2003 3:36 PM
> To: [EMAIL PROTECTED]
> Subject: RE: HTML::TokeParser
&g
ML is on the way to rescue. is you
use get_token() instead of get_tag(), it might be easier. get_token()
return for all token and it will be the programmer's responsibility to use
the token. get_tag() eats up the tokens you don't want so it's tricky:
#!/usr/bin/perl -w
use s
print $token->[1]{href} || "what?","\n";
my $link_guts = $tok->get_trimmed_text("/a");
and then some how grab the 'src' and 'alt' attributes from each img tag in $link_guts
if it's an image and the regular text if it's not and pr
Dan Muey wrote:
>
>
> The script you sent me does get the imagses, mine doesn't so I screwed up
> somewhere along the way. I'll take your original and modify it one step at
> a time to narrow down what I did wrong.
>
> I'll post back when I get it right so that hopefully someone can learn
> fro
ken->[1]{content} || "-";
>
> That would grab content for all of them,
> The name if it was a name and if it was an http-equiv it
> would make come out as -
>
> Not sure what I'm doing wrong but any way here's my script :: Thanks
>
> #!/usr/bin/perl
&
ontent for all of them,
The name if it was a name and if it was an http-equiv it would make come out as -
Not sure what I'm doing wrong but any way here's my script ::
Thanks
#!/usr/bin/perl
use LWP::Simple;
use HTML::TokeParser;
$url = $ARGV[0];
$content = get($url);
to be working:
#!/usr/bin/perl -w
use strict;
use HTML::TokeParser;
my $tok = new HTML::TokeParser(*DATA) || die $!;
while(1){
my $token = $tok->get_tag("a","img");
last unless($token);
if($token->[0] eq 'a'){
print $token-&
> Dan Muey wrote:
>
> >
> >>
> >> I am trying to use HTML::TokeParser
> >> From the cpan page for this I used this example :
> >>
> >> while (my $token = $p->get_tag("a")) {
> >>
Dan Muey wrote:
>> I am trying to use HTML::TokeParser
>> From the cpan page for this I used this example :
>>
>> while (my $token = $p->get_tag("a")) {
>> my $url = $token->[1]{href} || "-";
>&g
Dan Muey wrote:
>
>>
>> I am trying to use HTML::TokeParser
>> From the cpan page for this I used this example :
>>
>> while (my $token = $p->get_tag("a")) {
>> my $url = $token->[1]
>
> I am trying to use HTML::TokeParser
> From the cpan page for this I used this example :
>
> while (my $token = $p->get_tag("a")) {
> my $url = $token->[1]{href} || "-";
>
I am trying to use HTML::TokeParser
>From the cpan page for this I used this example :
while (my $token = $p->get_tag("a")) {
my $url = $token->[1]{href} || "-";
my $text = $p->get_trimmed_text(&quo
I suppose is a browser directive, but TokeParser returns an actual
character.
Will research HTML::Entities' relationship to HTML::TokeParser
later...probably much later, because now it's working.
Dave
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-ma
"John W. Krahn" <[EMAIL PROTECTED]> wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
> Rob Dixon wrote:
> >
> > "David Eason" <[EMAIL PROTECTED]> wrote in message
> > [EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
> > > I get a space in my editor output window. but when I run it from a
c
Rob Dixon wrote:
>
> "David Eason" <[EMAIL PROTECTED]> wrote in message
> [EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
> > I get a space in my editor output window. but when I run it from a cmd
> > window, I get the other character. (This is under Windows 2000 and
> perl
> > 5.8.0)
>
> Windows co
"David Eason" <[EMAIL PROTECTED]> wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
> I get a space in my editor output window. but when I run it from a cmd
> window, I get the other character. (This is under Windows 2000 and
perl
> 5.8.0)
Hi David
Windows command prompt doesn't supp
> AFAIK it is.
>
> > I have corrected the code below accordingly and it prints "line
> > 1line 3" as desired.
>
> FWIW on my computer "\240" prints a "space". :-)
>
> > use strict;
> > use warnings;
> > use HTML::TokeParse
de below accordingly and it prints "line
> 1line 3" as desired.
FWIW on my computer "\240" prints a "space". :-)
> use strict;
> use warnings;
> use HTML::TokeParser;
>
> my $p = HTML::TokeParser->new(*DATA) or die "Can't open: $!&quo
breaking space
Thanks, John, I had no idea where to look. I didn't know a non-breaking
space was an actual character, I thought it was just a directive to the
browser. I have corrected the code below accordingly and it prints "line
1line 3" as desired.
use strict;
use warnings;
use
tween 'line 1' and 'line
> 3'. Any idea what is going on?
According to HTML::Entities
# Some extra Latin 1 chars that are listed in the HTML3.2 draft
(21-May-96)
copy => '©', # copyright sign
reg=> '®', # registered sign
nbsp => "
is going on?
use strict;
use warnings;
use HTML::TokeParser;
my $p = HTML::TokeParser->new(*DATA) or die "Can't open: $!";
while (my $tag = $p->get_tag())
{
print $p->get_trimmed_text() if ($tag->[0] eq "dd")
}
__END__
__DATA__
line 1
line 3
--
To uns
On Nov 25, sulfericacid said:
>#!/usr/local/bin/perl -w
>
>my %form;
>my $content;
>
>my $userurl = $form{'userurl'};
Where do you think %form gets populated? You want to use the CGI.pm
module:
use CGI 'param';
my $userurl = param('userurl');
>print "Content-type: text/html\n\n";
>
>use di
errors or
logfiles, but it prints a blank screen. Anyone see why?
Thanks.
sulfericacid
#!/usr/local/bin/perl -w
my %form;
my $content;
my $userurl = $form{'userurl'};
print "Content-type: text/html\n\n";
use diagnostics;
use strict;
use LWP::Simple;
$content = get($userurl)
--- sulfericacid <[EMAIL PROTECTED]> wrote:
> use HTML::TokeParser
> my $p = HTML::TokeParser->new(\$content);
>
> my %meta;
> while (my $token = $p->get_token) {
> next unless $token->[1] eq 'meta' && $token->[0] eq 'S
On Mon, 2002-11-25 at 11:30, Jeff 'japhy' Pinyan wrote:
> On Nov 24, sulfericacid said:
>
> >use LWP::Simple
> >use HTML::TokeParser
>
> You're missing semi-colons after those two 'use' statement.s
And while you're at it:
$username =~ s/e/
On Nov 24, sulfericacid said:
>use LWP::Simple
>use HTML::TokeParser
You're missing semi-colons after those two 'use' statement.s
--
Jeff "japhy" Pinyan [EMAIL PROTECTED] http://www.pobox.com/~japhy/
RPI Acacia brother #734 http://www.perlmonks.or
tent = get($form{'userurl'});
use HTML::TokeParser
my $p = HTML::TokeParser->new(\$content);
my %meta;
while (my $token = $p->get_token) {
next unless $token->[1] eq 'meta' && $token->[0] eq 'S';
$meta{$token->[2]->{
On Mon, Feb 25, 2002 at 02:29:58PM +1030, Daniel Falkenberg wrote:
> my $content = $response->content;
> $p = HTML::TokeParser->new($content) || die "Can't open: $!";
> while ($stream->get_tag("h1")) { $data = get_trimmed_text("/h1");}
To sta
Hey all,
Just wondering why I still can't get HTML::TokeParser to either download
that page I am looking for or at least store the HTML from the requested
page. I know I could quite easily do this if I used HTML::Tableextract
except the data I want is only about 3 lines of HTML and there a
At 11:43 AM 2/25/02 +1030, Daniel Falkenberg wrote:
>Hello All,
>
>Is it possible for HTML::TokeParser to be able to work like
>HTML::TreeBuilder. I don't want to have to download the webpage I just
>want to be able to view its' HTML content
Okay, I give up. How do y
Hello All,
Is it possible for HTML::TokeParser to be able to work like
HTML::TreeBuilder. I don't want to have to download the webpage I just
want to be able to view its' HTML content and then extract the data from
there. If this isn't possible what would be my best bet for
51 matches
Mail list logo