12.02.2020 23:58, Tom Lane wrote: > Alexander Lakhin <exclus...@gmail.com> writes: >> Please look at a less invasive approach that we use at Postgres Pro for >> some time (mainly for improving the translated documentation, but it >> works for the original one too). The idea is to add zero-width spaces >> after/before some chars ('(', ',', '[', etc) to let fop split lines >> where desired. It has one disadvantage - it's not search-friendly >> (though maybe that is application-dependent). >> But if it's feasible, I think this approach can at least complement a >> manual tables reformatting. Decreasing a font size in the tables seems >> appropriate to me too. > Hmm, interesting proposal. I experimented and verified that injecting > zero-width space (​) does allow line breaking to occur in both > HTML and PDF output, so this could be a route to improving the situation > for overlength example texts. I do not think I like the idea of > automatically injecting tons of them, though. As you say, it might > hinder searching; and it would allow some silly breaks; and there are > cases where it still wouldn't find a break, such as the examples for > sha256() et al. I'd be happier about manually inserting breaks just > in the places we really need them. To keep the source readable, I'd > want to write something like "&zwsp;" not a numeric entity code, > but it looks like we can define custom entities if we want. Yes, I was starting with manual &zwsp; insertions into the translation, but later I reduced such insertions just to several dozens. (For example, we still have "3.1415926535&zwsp;8979323846" in the translation.) The main issue of the manual approach was that I needed to recheck that zwsp placement on updates, and I can't see where it's desired until I generate pdf. Fortunately, fop prints warning like that: [WARN] FOUserAgent - The contents of fo:block line 2 exceed the available area in the inline-progression direction by 22725 millipoints. (See position 127769:983) It's not very user-friendly, but still useful when we have a pair or two of them. (For now, I see 559 such warnings in REL_12_STABLE.) Second issue is that the placement can depend on the page size and in fact most of that zwsps are not needed for html or other formats (moreover, some formats can require different placements (if we're not just implementing some common rules)). Third (minor) issue is with translation - when I will see some break in the English source, e.g. "split_part('abc~@~def&zwsp;~@~ghi', '~@~', 2)", should I leave the break in the same place, or it's better to move it because adjacent text has different length and the table columns have different width?
For me this approach expresses a belief that the line breaking rules should be slightly different in our context. For example, having line break after an opening bracket is feasible and common in function calls and declarations. Maybe the rules in the proposed xslt could be improved/restricted, but I think that if fop would allow us to enable an imaginary 'programming language line breaking rules' mode, we would use it for our tables (some or all). Maybe some of the rules can be implemented explicitly in the DocBook source, just to reduce tons of zwsp in the generated output, or the "fo:table-cell/fo:block//text()" condition can be improved to filter some (text-only?) tables out, but I think that the idea of our specific line breaking rules could work. Best regards, Alexander