Well, for converting HTML to RTF, I believe Johan was meaning that you should
be using an HTML parser AND a RTF Generator to:
read HTML file watching for events
when an event happens check the event data such as what tag fired the event and
then pass that info along with the (tag) data off to the RTF generator object.
This would be very similar to how the XML::SAX* modules work. I have not
really worked with the XML::SAX* modules but a few times, but basically, you
write your own package object and use XML::SAX* to capture events in the HTML
source file. It passes these to your package subs, which you can then do
conditional processing based on what event is sent. And then, you pass this
data to where ever you need (usually an XML writer. Basically, this is a way
to transform one xml document into something else, either XML, HTML, CSV, or
whatever format you can write up).
While going with this approach would take a little longer:
1)its main advantage is that it is easier to package into a real module to
share (hint)
2) its extensable
3) with the events already defined in HTML and the events already defined in
RTF output, it will be far less work to change the parsing rules then the role
your own approach taken in the sub below.(you dont have to worry about.
I can "agree" with you on your point about RTF::Parser's lack of documentation,
but it still is a decent prebuilt package. Generally, "we" end up missing
something when trying to do something manually that a module already has been
built to do.
I tried the RTF::Parser's rtf2html.bat and found it did a very good job. Now,
granted, I did not pass anything odd into it the html file, but it created very
nice HTML output.
Hope this helps.
Joe Frazier, Jr.
Technical Support Engineer
Peopleclick Service Support
Tel: +1-800-841-2365
E-Mail: mailto:[EMAIL PROTECTED]
> -Original Message-
> From: Ultimate Red Dragon [mailto:[EMAIL PROTECTED]
> Sent: Thursday, March 07, 2002 6:22 PM
> To: perl-win32-gui-users@lists.sourceforge.net
> Subject: [perl-win32-gui-users] Re: Re: RTF 2 HTML
>
>
> Well, in reply to Johan. I'll admit that I kinda knew those
> were there, but
> the documentation on them is either horrible or non-existent
> (depending on
> which RTF modules you look at.) As for the HTML2RTF, I know
> of no already
> existing interpreter, but I plan on using HTML::Parser to
> make it simpler.
>
> Anyway, I managed to get it to properly translate '<', '>'
> and '&' into
> their HTML counterparts. Please point out any bugs or
> suggestions you have.
>
> sub rtf2html{
>my $re = $main->reDesc; #Just set this to the RichEdit object
>my $oldtext = $re->Text();
>my @escapes;
>{
> my $temp = -1;
> while(($temp = index($oldtext,'<',$temp+1)) != -1){
> push(@escapes,[$temp,'<']);
> }
> $temp = -1;
> while(($temp = index($oldtext,'>',$temp+1)) != -1){
> push(@escapes,[$temp,'>']);
> }
> $temp = -1;
> while(($temp = index($oldtext,'&',$temp+1)) != -1){
> push(@escapes,[$temp,'&']);
> }
>}
>
>@escapes = sort({ $a->[0] <=> $b->[0] } @escapes);
>foreach (@escapes){
> print $_->[0]." = ".$_->[1]."\n";
>}
>
>my $i = 0;
>my $b = 0;
>my $u = 0;
>my $text = '';
>
>my $offset = 0;
>foreach my $x (0..length($oldtext)){
> $re->Select($x,$x+1);
> my %att = $re->GetCharFormat();
> if(($i && !exists($att{-italic})) || (!$i &&
> exists($att{-italic}))){
> $i = $att{-italic};
> $text .= ($i ? '' : '');
> }
> if(($b && !exists($att{-bold})) || (!$b &&
> exists($att{-bold}))){
> $b = $att{-bold};
> $text .= ($b ? '' : '');
> }
> if(($u && !exists($att{-underline})) || (!$u &&
> exists($att{-underline}))){
> $u = $att{-underline};
> $text .= ($u ? '' : '');
> }
> if(defined($escapes[0]->[0]) && $x == $escapes[0]->[0]){
> my $temp = shift(@escapes);
> $text .= $temp->[1];
> }else{
> $text .= substr($oldtext,$x,1);
> }
>}
>$text =~ s/\r//g;
>$text =~ s/\n//gi;
>return $text;
> }
>
>
>
> Date: Thu, 07 Mar 2002 09:47:52 +0100
> To: perl-win32-gui-users@lists.sourceforge.net
> From: Johan Lindstrom <[EMAIL PROTECTED]>
> Subject: Re: [perl-win32-gui-users] RTF 2 HTML
>
> At 23:37 2002-03-06 -0500, Ultimate Red Dragon wrote:
> >It's not that great, I don't claim it's efficient, just
> that it works.
> >
> >Currently, it supports new lines, bold, italics and underline.
>
> This seems to be similar to what you want:
> http://search.cpan.org/search?dist=RTF-Parser
>
>
> >I'm working on converting < and > correctly, as well as a
> HTML 2 RTF sub
> >(or is there already one?)
>
> There are HTML parsers and RTF generators on CPAN.
>
> Here is the search for module names with RTF:
> http://search.cpan.org/search?mode