Hi Branden, On mar., mai 23 2023 at 02:15:33 , "G. Branden Robinson" <g.branden.robin...@gmail.com> wrote: >> > Myself, I wonder if K-P couldn't be implemented above the formatter >> > itself, using a diversion. We could then put the implementation in >> > an auxiliary macro package. [...] > I'll note that even if K-P can't be done this way, because the algorithm > requires a view of more than one page at a time
Indeed, I don't think it's possible to implement Knuth-Plass in a macro package, it's already very difficult to implement it into tree, but in a macro package it seems impossible. > (I, uh, admit I haven't actually learned how Knuth-Plass works yet) I've previously worked on Knuth-Plass [1], the old 'format_knuth_plass' branch no longer compiled so I've deleted it and pushed a fresh new 'format_knuth_plass2' fully sync against master. The best resource is of course Donald E. Knuth's "Digital Topography" book, chapter "Breaking Paragraph Into Lines". This link [2] is also very helpful with a visual description of how the algorithm works (In my previous post I've said that there was a mistake in the demerit calculation though, but I no longer remember what the problem exactly is). Going back to my 'format_knuth_plass2': it contains a paragraph.cpp file that fully implement the Knuth-Plass algorithm, but it is not yet connected to the troff program. Nevertheless the paragraph.cpp is fully tested by the 'test_paragraph' test program. You have to explicitly build it (it is not built by default): make test_paragraph It contains a unit test suite (using the CppUnit framework), if you launch it without option it will run all the unit tests. If you pass options '-s 1 -t 1' (suite 1 test 1) it will format the paragraph of the original Knuth example: -- Suite 1 Initialisation -- Test11... Checking the number of lines Checking the lines adjustement ratio Checking all breakpoints demerits Checking the best breakpoints array Number of lines: 10 | adj. ratio | total demerits | fitness class | In olden times when wishing still helped one, there lived a 0.774 2209 0 ^ king whose daughters were all beautiful; and the youngest was 0.179 2213 0 ^ ^ so beautiful that the sun itself, which has seen so much, was 0.629 2889 0 ^ ^ astonished whenever it shone in her face. Close by the king's 0.545 3178 0 ^ ^ castle lay a great dark forest, and under an old lime-tree in the 0.000 3179 0 ^ ^ ^ ^ forest was a well, and when the day was very warm, the king's 0.079 3180 0 ^ ^ ^ ^ ^ ^ child went out into the forest and sat down by the side of the 0.282 3189 0 ^ ^ ^ ^ ^ ^ cool fountain; and when she was bored she took a golden ball, 0.294 3205 0 ^ ^ ^ ^ ^ ^ and threw it up on high and caught it; and this ball was her 0.575 3605 0 ^ ^ ^ ^ ^ ^ ^ favorite plaything. 0.000 3606 0 ^ ^ ^ ^ -- Test test11_original_example PASSED It shows, for each line, the candidate breakpoints (with a `^' sign) and the calculation of "adjustment ratio" and "total demerits". The test checks the obtained result against the values given in Knuth's book. Note that Knuth later corrected his "demerit" formula, so this examples uses the old formula but not the other examples. You can also give to the 'test_paragraph' program any text in a file, and he will try to format the text in a paragraph. You can adjust the "tolerance" and the line width. For example if you use the attached 'lorem.txt' and do: test_paragraph -f lorem.txt it will first fail: Processing content of ../../lorem.txt with tolerance:1.000 line length:500 [paragraph.cpp:1072:format_knuth_plass]Could not format paragraph But you can play on the "tolerance" or the line length with -T and -l, e.g. pass a tolerance of 2.1: ./test_paragraph -f lorem.txt -T 2.1 Processing content of lorem.txt with tolerance:2.100 line length:500 Number of lines: 14 | adj. ratio | total demerits | fitness class | Lorem ipsum dolor sit amet, consectetur adipiscing elit. 2.000 641601 3 ^ Sed non risus. Suspendisse lectus tortor, dignissim sit amet, 1.000 651802 2 ^ ^ adipiscing nec, ultricies sed, dolor. Cras elementum ultrices 1.192 680702 3 ^ diam. Maecenas ligula massa, varius a, semper congue, euismod 0.100 690703 1 ^ non, mi. Proin porttitor, orci nec nonummy molestie, enim est 0.242 690707 1 ^ ^ eleifend mi, non fermentum diam nisl sit amet erat. Duis semper. -0.737 692388 0 ^ ^ Duis arcu massa, scelerisque vitae, consequat in, pretium a, enim. -0.833 695869 0 ^ ^ ^ ^ Pellentesque congue. Ut in risus volutpat libero pharetra tempor. -0.733 697469 0 ^ ^ ^ Cras vestibulum bibendum augue. Praesent egestas leo in pede. 0.000 697470 1 Praesent blandit odio eu enim. Pellentesque sed dui ut augue 0.633 698146 2 blandit sodales. Vestibulum ante ipsum primis in faucibus orci 0.296 698162 1 luctus et ultrices posuere cubilia Curae; Aliquam nibh. Mauris ac -0.438 698243 1 mauris sed pede pellentesque fermentum. Maecenas adipiscing 0.667 699204 2 or keep the tolerance to 1.0 and use a line of 850: ./test_paragraph -f lorem.txt -l 850 Processing content of lorem.txt with tolerance:1.000 line length:850 Number of lines: 8 | adj. ratio | total demerits | fitness class | Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed non risus. Suspendisse lectus tortor, dignissim -0.423 10081 1 ^ sit amet, adipiscing nec, ultricies sed, dolor. Cras elementum ultrices diam. Maecenas ligula massa, varius 0.365 10117 1 ^ a, semper congue, euismod non, mi. Proin porttitor, orci nec nonummy molestie, enim est eleifend mi, non 0.175 10121 1 ^ ^ ^ fermentum diam nisl sit amet erat. Duis semper. Duis arcu massa, scelerisque vitae, consequat in, pretium a, -0.200 10125 1 enim. Pellentesque congue. Ut in risus volutpat libero pharetra tempor. Cras vestibulum bibendum augue. 0.229 10129 1 Praesent egestas leo in pede. Praesent blandit odio eu enim. Pellentesque sed dui ut augue blandit sodales. 0.185 10133 1 Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae; Aliquam nibh. Mauris -0.071 10134 1 ac mauris sed pede pellentesque fermentum. Maecenas adipiscing ante non diam sodales hendrerit. 0.000 10135 1 Of course the shorter the line, the harder it is for the algorithm to find a solution with the given tolerance. Note that 'test_pragraph' supports only ascii, the width of the various letter are hard-coded with values from Knuth'book, and there is no hyphenation support (only for Knuth's original example I simulate that we may have some words hyphenated). The main difficulties to connect this work to troff are that the three big files of troff (input.cpp, env.cpp, node.cpp) are too closely tied and that the whole line-oriented logic is difficult to adapt (particularly the diversion is hard to manage). Some preliminary refactoring would probably be needed. If people are interested I may try to find some time to work on it. [1] https://lists.gnu.org/archive/html/groff/2017-11/msg00079.html [2] https://defoe.sourceforge.net/folio/knuth-plass.html Regards, Bertrand
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed non risus. Suspendisse lectus tortor, dignissim sit amet, adipiscing nec, ultricies sed, dolor. Cras elementum ultrices diam. Maecenas ligula massa, varius a, semper congue, euismod non, mi. Proin porttitor, orci nec nonummy molestie, enim est eleifend mi, non fermentum diam nisl sit amet erat. Duis semper. Duis arcu massa, scelerisque vitae, consequat in, pretium a, enim. Pellentesque congue. Ut in risus volutpat libero pharetra tempor. Cras vestibulum bibendum augue. Praesent egestas leo in pede. Praesent blandit odio eu enim. Pellentesque sed dui ut augue blandit sodales. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae; Aliquam nibh. Mauris ac mauris sed pede pellentesque fermentum. Maecenas adipiscing ante non diam sodales hendrerit.