URL: <https://savannah.gnu.org/bugs/?65585>
Summary: Problems introduced by commit cd9fde325f Group: GNU roff Submitter: deri Submitted: Fri 12 Apr 2024 03:51:08 PM UTC Category: Driver gropdf Severity: 3 - Normal Item Group: Incorrect behaviour Status: In Progress Privacy: Public Assigned to: deri Open/Closed: Open Discussion Lock: Any Planned Release: None _______________________________________________________ Follow-up Comments: ------------------------------------------------------- Date: Fri 12 Apr 2024 03:51:08 PM UTC By: Deri James <deri> I have been looking at the current state of pdf.tmac and have found a few issues. The changes which have been committed are:- Introduction of a new flag, -S, which replaces my change which introduced the concept of passing a single pipe character as a hotspot meant that the actual text to form the actual hotspot would be "piped" to the document stream, terminated by doing a markstop. This is what allowed Branden to implement the man macros which place contents into a diversion before emitting a hotspot. I have no objection to this change, although there were several ways of setting up a hotspot containing a single pipe character before this change, and most users are probably familiar with piping data. (Perhaps I should have used "<" as the single character). I would be very annoyed if someone sent me a pdf with a single "|" as a hotspot - would take me about 5 minutes to click it with the mouse! Another change was an attempt to only allow a restart after a preceding suspend. This is a good idea, but it is in the wrong place, it should be implemented in gropdf rather than pdf.tmac, since a user may use \X'pdf: markrestart' as documented in the gropdf man page. Which bypasses the check in pdf.tmac. Branden has implemented a looping construct to hold tags replacing an associative array. It is not used if the user is using mom macros (it does not work for mom documents). In fact it only works reliably for the man macros producing a "book" of man pages, where it is used to differenciate between an internal link to another page in the book or a "man:" URL to an external man page. The problem is that the new code does not implement:- .pdfhref L -D name Which should emit the descriptive text associated with the named tag at the time it was defined, but instead it just emits the "name". This is documented in pdfmark.pdf, and used several times in pdfmark.ms, so it is reasonable to assume users may be using this in their own documents if they use "pdfmom --roff -mspdf". So it is wrong to assume the new code is suitable for everything except mom. There is also an issue with the speed of the new code. One test file I used went from an elapsed time of 0m3.08s to 11m31.62s. (About 670 times slower!). This was producing a large document (LinuxManBook) from a single file, using the command:- time ~/groff-git/groff/test-pdfmom --roff -Tpdf -mandoc -petk LinuxManBook.trf -z (test-pdfmom is the same as test-groff but calling the pdfmom in the build directory). I don't know whether this slow down in groff will be acceptable to most users, I'm meant to be on a sabatical so I am not going to argue either way, but be aware that is Branden's attempt to solve one issue which was solved months ago in my branch waiting for merging. My use of stringhex was described as "obfuscation", which is rather insulting if the intention was to imply a deliberate attempt to obscure the purpose of my code. The correct interpretation of its use is to protect the data from interpretation by groff so that any byte sequence can be used as a string name in groff. This is analagous to using base64 to protect binary data in emails. Although Branden's loopy code solves some issues with 1.23.0 it fails in a number of ways which are dealt with successfully by the pdf.tmac in my branch (i.e. t.trf in #64576). There is a bug in Branden's code. The attached file, pdf-L.trf, illustrates the issue. According to the documentation in pdfmark.pdf, if you use .pdfhref L with a destination name but no descriptive text, the descriptive text given when the destination is named is used. With Branden's code, instead of using the descriptive text it uses the name of the destination instead. The two pdfs called pdf-L illustrate the problem. Another bug results in entries in the array Branden loops over get over-vritten in certain circumstances. This code illustrates the bug. .ig groff -Tpdf -dPDF.EXPORT=1 -z pdf-M.trf Results:- .ds pdf:bm1.tag one .ds pdf:bm1.tag two (The bm1.tag has been overwritten!!!) But, groff -Tpdf -dPDF.EXPORT=1 -dPRINTSTYLE=1 -z pdf-M.trf Results:- .ds pdf:look(one) Once upon... .ds pdf:look(two) This is two .. .pdfbookmark -T one 1 Once upon... .pdfhref M -D two -E This is two Setting "PRINTSTYLE=1" bypasses Branden's changes, because he uses the original code if mom macros are used. I don't understand why it is desirable to have two separate methods. I have fixed all the problems listed above, plus a speed up of Branden's loopy code. I have also included Keith's clever solution to polluting the pdf flags with data which can cause the errors about illegal characters in identifier names if the optional "--" marker not used. There is a minor regression in this code, not caught by any of the check tests, but when producing our pdf documents you will notice these new warnings:- gropdf:man/groff.7: warning: Can't convert 'rs' to unicode gropdf:../contrib/mom/examples/sample_docs.mom: warning: Can't convert 'em' to unicode They are caused because gropdf currently has no way of knowing what the UTF-16 for these groff characters. Previously the "em" character was handled by a:- .tr \[em]- In pdf.tmac. I was never happy with this "fudge" and the correct handling was part of my branch. I have changed the method slightly in view of what Branden wrote on the list. Afmtodit does not add an extra column to the font files it creates but adds the UTF-16 code as a comment. The groff_font man page has been amended appropriately. The grops standard fonts have also had the comment added, which corrects the regression. I am enjoying my sabbatical away from groff, but I thought I should correct these recent changes, since they would cause issues if released, but I would be grateful if other major changes to pdf production could wait until I am back, fighting fit to do any QA. Of course, if you find any problems with the code, please let me know through this bug report. _______________________________________________________ File Attachments: ------------------------------------------------------- Name: pdf-L.trf Size: 2KiB <https://file.savannah.gnu.org/file/pdf-L.trf?file_id=55936> ------------------------------------------------------- Name: pdf-L-DJ.pdf Size: 3KiB <https://file.savannah.gnu.org/file/pdf-L-DJ.pdf?file_id=55937> ------------------------------------------------------- Name: pdf-L-GBR.pdf Size: 3KiB <https://file.savannah.gnu.org/file/pdf-L-GBR.pdf?file_id=55938> AGPL NOTICE These attachments are served by Savane. You can download the corresponding source code of Savane at https://git.savannah.nongnu.org/cgit/administration/savane.git/snapshot/savane-4448d8da181e3133acac16b178e98513aa6405cd.tar.gz _______________________________________________________ Reply to this item at: <https://savannah.gnu.org/bugs/?65585> _______________________________________________ Message sent via Savannah https://savannah.gnu.org/