Re: Wide character in print at HTML::TreeBuilder::XPath function

2020-01-26 Thread John SJ Anderson
> That, I am very grateful to report, solved that question. I guess the > scope of "use utf8;" is more narrow than I had thought. When you have a bit of time, sitting down and reading through https://github.com/rgs/p5-intelligible-unicode is a re

Re: Wide character in print at HTML::TreeBuilder::XPath function

2020-01-26 Thread Lars Noodén
On 1/26/20 9:58 AM, Lars Noodén wrote: > I've got a long script that has "use utf8;" near the top. The script > parses some HTML and then I run into trouble when printing the result as > shown below: > > use utf8; > use HTML::TreeBuilder::XPath; &

Wide character in print at HTML::TreeBuilder::XPath function

2020-01-25 Thread Lars Noodén
I've got a long script that has "use utf8;" near the top. The script parses some HTML and then I run into trouble when printing the result as shown below: use utf8; use HTML::TreeBuilder::XPath; . . . my $xhtml = HTML::TreeBuilder::XPath->new; $xhtm

Re: trying to understand HTML::TreeBuilder::XPath

2013-01-29 Thread Rob Dixon
On 26/01/2013 20:44, Jeswin wrote: > Hi, > I'm trying to parse out the emails addresses from a webpage and I'm > using the HTML::TreeBuilder::XPath module. I don't really understand > XML and it's been a while since I worked with perl*. So far I mashed > up a co

Re: trying to understand HTML::TreeBuilder::XPath

2013-01-28 Thread Charles DeRykus
p to but not including the second double-quote: >> >> if( $link =~ /"mailto:([^"]+)/ ) { > > I've never used HTML::TreeBuilder::XPath, but I highly doubt that > the attr method would return the quotes (and if it did, they > could be single-quotes instead). It wo

Re: trying to understand HTML::TreeBuilder::XPath

2013-01-28 Thread Brandon McCaig
;mailto:([^"]+)/ ) { I've never used HTML::TreeBuilder::XPath, but I highly doubt that the attr method would return the quotes (and if it did, they could be single-quotes instead). It would probably be best to find a module that knows how to properly parse mailto URIs, but failing that I thin

Re: trying to understand HTML::TreeBuilder::XPath

2013-01-26 Thread Octavian Rasnita
From: "Jeswin" Hi, I'm trying to parse out the emails addresses from a webpage and I'm using the HTML::TreeBuilder::XPath module. I don't really understand XML and it's been a while since I worked with perl*. So far I mashed up a code by looking through past exa

Re: trying to understand HTML::TreeBuilder::XPath

2013-01-26 Thread Jim Gibson
On Jan 26, 2013, at 3:52 PM, Jim Gibson wrote: > However, if your program is successfully finding all of the tag sections > of the web page, and your only problem is distinguishing between email links > and other types of links, you can use regular expressions to detect mailto > links: > > m

Re: trying to understand HTML::TreeBuilder::XPath

2013-01-26 Thread Jim Gibson
On Jan 26, 2013, at 12:44 PM, Jeswin wrote: > Hi, > I'm trying to parse out the emails addresses from a webpage and I'm > using the HTML::TreeBuilder::XPath module. I don't really understand > XML and it's been a while since I worked with perl*. So far I mashe

trying to understand HTML::TreeBuilder::XPath

2013-01-26 Thread Jeswin
Hi, I'm trying to parse out the emails addresses from a webpage and I'm using the HTML::TreeBuilder::XPath module. I don't really understand XML and it's been a while since I worked with perl*. So far I mashed up a code by looking through past examples online. The HTML port

Problem with HTML::TreeBuilder and look_down()

2010-06-14 Thread Craig
Hello All, I'm new to Perl, having only a week or twos experience, but experienced in other programming languages. I'm writing a script that will read a html file from disc and print the relevant parts for me. I have many html files all of them have a similar format, but some format variations ar

Re: HTML::TreeBuilder - handle invalid html gracefully

2009-08-23 Thread Roman Makurin
On Sun, Aug 23, 2009 at 02:56:44PM +0400, Roman Makurin wrote: > Hi All! > > How can I tell HTML::TreeBuilder to parse invalid html files > gracefully ? Here is an example: > > - > #!/usr/bin/perl > > use strict; > use warnings; > > use HTML::TreeBuilde

HTML::TreeBuilder - handle invalid html gracefully

2009-08-23 Thread Roman Makurin
Hi All! How can I tell HTML::TreeBuilder to parse invalid html files gracefully ? Here is an example: - #!/usr/bin/perl use strict; use warnings; use HTML::TreeBuilder; my $root = HTML::TreeBuilder->new_from_file(*DATA); print +($root->look_down(_tag=>'div', class

Re: HTML::TreeBuilder encode symbols as html entities

2009-08-14 Thread Roman Makurin
On Fri, Aug 14, 2009 at 5:35 PM, Shawn H. Corey wrote: > Roman Makurin wrote: >> >> dump result is html encoded entities: >> >> @0.1.5.1 >>  > title="Ссылка ">@0.1.5.1.0 >> >> all html entities are valid unicode code points of symb

Re: HTML::TreeBuilder encode symbols as html entities

2009-08-14 Thread Shawn H. Corey
Roman Makurin wrote: dump result is html encoded entities: @0.1.5.1 @0.1.5.1.0 all html entities are valid unicode code points of symbols. But why HTML::TreeBuilder convert symbols to entities ? Because some browsers do not understand Unicode. Or they didn't. If I just do

HTML::TreeBuilder encode symbols as html entities

2009-08-14 Thread Roman Makurin
Hi All. I have a problem with HTML::TreeBuilder. Here is sample code without any error checking: $ua = new LWP::UserAgent -timeout=>10; $resp = $ua->get($url); $content = decode('encoding_of_web_page', $resp->content); decode_entities($content); $r = HTML::TreeBuilde

Re: HTML::TreeBuilder help

2008-07-17 Thread Rob Dixon
Ryan wrote: > > The Dump method gives me this: > cellpadding="0" cellspacing="0" width="100%"> @0.1.1.0.1.0.0.0.0.1 > @0.1.1.0.1.0.0.0.0.1.0 >@0.1.1.0.1.0.0.0.0.1.0.0 > > How can I make use of "@0.1.1.0.1.0.0.0.0.1" if I know that's t

HTML::TreeBuilder help

2008-07-17 Thread Ryan
The Dump method gives me this: cellpadding="0" cellspacing="0" width="100%"> @0.1.1.0.1.0.0.0.0.1 @0.1.1.0.1.0.0.0.0.1.0 @0.1.1.0.1.0.0.0.0.1.0.0 How can I make use of "@0.1.1.0.1.0.0.0.0.1" if I know that's the element I want? Also i

Re[2]: HTML::TreeBuilder - finding a text element

2007-03-26 Thread Brandino Andreas
ok, thx a lot Sunday, March 25, 2007, 9:38:08 PM, you wrote: > Brandino Andreas wrote: >> >> Hi list >> I am using HTML::TreeBuilder to parse a html page and find a specific >> value. >> >> When i dump the array i get t

Re: HTML::TreeBuilder - finding a text element

2007-03-25 Thread Rob Dixon
Brandino Andreas wrote: Hi list I am using HTML::TreeBuilder to parse a html page and find a specific value. When i dump the array i get this: $tree->dump(); more.. @0.1.0.1.1.0.0.0.0 @0.1.0.1.1.0.0.0.0.0 "MAC

HTML::TreeBuilder - finding a text element

2007-03-25 Thread Brandino Andreas
Hi list I am using HTML::TreeBuilder to parse a html page and find a specific value. When i dump the array i get this: $tree->dump(); more.. @0.1.0.1.1.0.0.0.0 @0.1.0.1.1.0.0.0.0.0 "MAC Address" @0.1.

Re: HTML::TreeBuilder

2002-02-06 Thread Peter Scott
At 12:14 PM 2/7/02 +1030, Daniel Falkenberg wrote: >I am currently working with the perl CPAN module HTML::TreeBuilder. Is >someone able to explain to me if this is the module I want to use to >extract data from a HTML page. Now this HTML page is contained outside of >my per

HTML::TreeBuilder

2002-02-06 Thread Daniel Falkenberg
Hey all, I am currently working with the perl CPAN module HTML::TreeBuilder. Is someone able to explain to me if this is the module I want to use to extract data from a HTML page. Now this HTML page is contained outside of my perl-cgi script. From what I can gather the module will download

Re: HTML::Treebuilder

2001-12-20 Thread Michael Fowler
:Element documentation has an introduction to tree data structures. _Mastering Algorithms with Perl_ has some good information on data structures. It helps if you understand how HTML::TreeBuilder deals with HTML: it creates a tree data structure out of the elements. So, for example, the HTML code:

HTML::Treebuilder

2001-12-19 Thread McCollum, Frank
I do not understand what is meant by 'depth' in this module (i've read the accompanying documentation, but I didn't follow it well). Does anyone know where a good description is? I basically want to go to a website and figure out what the 'depth' is of a given table on that site, so that I can