On Fri, Apr 05, 2002 at 05:15:08AM -0800, drieux wrote:

> 
> ### #!/usr/bin/perl
> ###
> ### use HTML::Parser;
> ### use HTML::FormatText;
> ### use HTML::TreeBuilder;
> ###
> ### my $html_text;
> ### my $filename = $ARGV[0];
> ### open(FH, $filename) or die "unable to open file $filename :$!\n";
> ### while (<FH>) { $html_text .= $_ ; }
> ### ###my $plain_text = 
> HTML::FormatText->new->format(parse_html($html_text));
> ### my $tree = HTML::TreeBuilder->new->parse($html_text);
> ### my $plain_text = HTML::FormatText->new->format($tree);
> ###
> ### print "$plain_text\n";
> ###

I tried this code, and it did not work.

I also tried this code:

use HTML::TreeBuilder;
        my $tree = HTML::TreeBuilder->new();
        $tree->parse_file("/tmp/cleanup");

        use HTML::FormatText;
        my $formatter = HTML::FormatText->new(leftmargin => 0, rightmargin => 70);
        #print $formatter->format($tree);
        my $ascii = $formatter->format($tree);

It also did not work.

The problem is that the filter deletes all of my text and ouputs this:

[TABLE NOT SHOWN][TABLE NOT SHOWN][TABLE NOT SHOWN][TABLE NOT
SHOWN][TABLE NOT SHOWN]

I have tried it on five different files. All of these files were
from the same website. It appears that this module is broken.
That is, it can't handle certain html (which is valid when looked
at in a browser). 

I think I'm ready to try other filters (those not in perl).

Paul
 


-- 

************************
*Paul Tremblay         *
*[EMAIL PROTECTED]*
************************

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to