On Tue, Apr 17, 2007 at 06:47:14PM +0200, Agustin Martin wrote: > Hi, David and Sano, > > On Sat, Apr 14, 2007 at 02:07:45PM -0700, David Lawyer wrote: > > Package: linuxdoc-tools > > Version 0.9.21-0.5 > > > > Please merge bug 175575 into this bug report since it's a subset of > > the bug (and proposed fix) I'm now reporting. > > > > When I use sgml2txt I get both escape sequences and overstrikes which > > plain text output shouldn't normally have. There is an -f option to > > sgml2txt to eliminate the overstrikes. > > which by the way is buggy, and quite often does not remove all escapes,
I don't think the -f option is supposed to remove escapes. A few years or so ago, the escape problem didn't exist. I think it was caused by a change in the grotty program that made output with escapes the default. > > It very important to keep the > > use of sgml2txt as simple as possible since the main advantage of > > linuxdoc format over docbook is that it's simple and using the > > linuxdoc-tools should also be simple. The escape sequences are only > > for vt100 terminals (and the like) and will not display if one uses an > > editor (like vim) or pager (like less or most) to read the file. > > Overstrikes don't usually get displayed right either although some > > pagers can deal with them for some cases (such as underline). > > > > So the default for conversion to text should (in my opinion) be just > > plain text. > > > > The documentation for linuxdoc-tools fails to explain how to get > > various types of text outputs using sgml2txt. It should. The way to > > get plain text is to pass options to the grotty program from the sgml2txt > > command line. Like this: sgml2txt --pass="-P-bcou". See "man grotty" > > for how these 4 options (bcou) work together. To make this the > > default, one could modify: /usr/share/linuxdoc-tools/dist/fmt_txt.pl > > For example, this seems to work although I've never studied Perl: > > > > create_temp("$global->{tmpbase}.txt.1"); > > #next line added by DL (David Lawyer) > > $global->{pass} = "-P-cbou" if $global->{pass} eq ""; > > $outfile = new FileHandle > > "|$main::progs->{GROFF} $global->{pass} -T $global->{charset} -t > > $main::progs->{GROFFMACRO} >\"$global->{tmpbase}.txt.1\""; > > Based on your proposed fix, I think something like in this diff > > ---------------------------------------------------------------- > @@ -329,6 +323,7 @@ > { > my $infile = shift; > my ($outfile, $groffout); > + my $txtfilter = $txt->{filter} ? "-P-cbou" : ""; > > if ($txt->{manpage}) > { > @@ -338,7 +333,7 @@ > { > create_temp("$global->{tmpbase}.txt.1"); > $outfile = new FileHandle > - "|$main::progs->{GROFF} $global->{pass} -T $global->{charset} -t > $main::progs->{GROFFMACRO} >\"$global->{tmpbase}.txt.1\""; > + "|$main::progs->{GROFF} $global->{pass} $txtfilter -T > $global->{charset} -t $main::progs->{GROFFMACRO} > >\"$global->{tmpbase}.txt.1\""; > } > > # > ------------------------------------------------------------------ > > can be used to make the -f option work as expected. But it's not expected to remove escapes. Also, you would need to delete the old code that removes overstrikes from the output if -f is used (uses the s/ command in perl. Suppose someone uses: sgml2txt -f --pass="-P-cu" Then the -f will pass the -cbou options to grotty while the user only wanted to pass -cu. My proposed patch would do just what the user specified with --pass but then -f would act as a filter and filter out overstrikes. So in your solution -f and --pass both give options to grotty and these options may conflict. -f is no longer a filter since it just passes options to grotty. One solution would be to use my "patch" and then have -f do nothing except print a message the use of -f was no longer needed. Eventually -f could be eliminated or just do nothing for backwards compatibility. > > I am generally not in favour of changing long-standing behaviors. I don't think many people outside of LDP are using sgml2txt. And I suspect those that are are likely using the -f option. > However, in this case, escaped characters are of so limited use that > might worth considering that. And they were introduced by a change in grotty. So then one would use -f to get plain text and without -f one would get overstrikes. > > I think a middle point is possible, making -f default for sgml2txt, > but not for linuxdoc -B txt. This way, escaped chars can easily be > obtained if really required (directly calling linuxdoc without the > -f option), but plain text is obtained from calls to sgml2txt (that > would be trivial to implement), with no option for the opposite > behavior here. If we are flamed for this, we could reconsider the > change. What do you think? I don't think it's too good since then the two commands aren't the same. But if you don't want to have the -B txt produce plain text, then it's better than doing nothing. sgml2txt is the older command while the -B is the newer. What about display a message when using the txt output to let people know of the change? It already displays a short message so just add to that. > > > Instead of hard-coding -cbou options into the code as I've done > > above, one could create a new variable, GROTTYOPTS, and set it > > equal to -cbou in the main program: > > /usr/share/linuxdoc-tools/LinuxDocTools.pm. I'm willing to do > > some more work on this and create patches (I've never done Linux > > patches before) provided of course that it's agreed that sgml2txt > > should generate plain text by default. > > I am relatively new to perl, and far from being a perl guru. I might > be missing a lot of important things here, but linuxdoc-tools perl > seems to me extremely ancient, and there are a lot of things I do > not understand why are done that way. As a matter of fact I am > changing some things to what is IMHO more readable, and found no > drawbacks yet. > > I mean that is probably not the kind of perl a new person will > enjoy, but you are of course welcome. > > As I mentioned elsewhere, although I am not the maintainer of this > package, I plan to keep improving it when possible, so I am happy to > receive your feedback through the Debian BTS. > > Thanks for your help and suggestions, And thanks for your quick response and effort on this. > > -- Agustin > David Lawyer -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]