Re: Getting our tables to render better in PDF output

Alexander Lakhin Wed, 12 Feb 2020 21:01:12 -0800

12.02.2020 23:58, Tom Lane wrote:
> Alexander Lakhin <[email protected]> writes:
>> Please look at a less invasive approach that we use at Postgres Pro for
>> some time (mainly for improving the translated documentation, but it
>> works for the original one too). The idea is to add zero-width spaces
>> after/before some chars ('(', ',', '[', etc) to let fop split lines
>> where desired. It has one disadvantage - it's not search-friendly
>> (though maybe that is application-dependent).
>> But if it's feasible, I think this approach can at least complement a
>> manual tables reformatting. Decreasing a font size in the tables seems
>> appropriate to me too.
> Hmm, interesting proposal.  I experimented and verified that injecting
> zero-width space (&#x200B;) does allow line breaking to occur in both
> HTML and PDF output, so this could be a route to improving the situation
> for overlength example texts.  I do not think I like the idea of
> automatically injecting tons of them, though.  As you say, it might
> hinder searching; and it would allow some silly breaks; and there are
> cases where it still wouldn't find a break, such as the examples for
> sha256() et al.  I'd be happier about manually inserting breaks just
> in the places we really need them.  To keep the source readable, I'd
> want to write something like "&zwsp;" not a numeric entity code,
> but it looks like we can define custom entities if we want.
Yes, I was starting with manual &zwsp; insertions into the translation,
but later I reduced such insertions just to several dozens. (For
example, we still have "3.1415926535&zwsp;8979323846" in the translation.)
The main issue of the manual approach was that I needed to recheck that
zwsp placement on updates, and I can't see where it's desired until I
generate pdf. Fortunately, fop prints warning like that:
[WARN] FOUserAgent - The contents of fo:block line 2 exceed the
available area in the inline-progression direction by 22725 millipoints.
(See position 127769:983)
It's not very user-friendly, but still useful when we have a pair or two
of them. (For now, I see 559 such warnings in REL_12_STABLE.)
Second issue is that the placement can depend on the page size and in
fact most of that zwsps are not needed for html or other formats
(moreover, some formats can require different placements (if we're not
just implementing some common rules)).
Third (minor) issue is with translation - when I will see some break in
the English source, e.g. "split_part('abc~@~def&zwsp;~@~ghi', '~@~',
2)", should I leave the break in the same place, or it's better to move
it because adjacent text has different length and the table columns have
different width?


For me this approach expresses a belief that the line breaking rules
should be slightly different in our context. For example, having line
break after an opening bracket is feasible and common in function calls
and declarations. Maybe the rules in the proposed xslt could be
improved/restricted, but I think that if fop would allow us to enable an
imaginary 'programming language line breaking rules' mode, we would use
it for our tables (some or all).
Maybe some of the rules can be implemented explicitly in the DocBook
source, just to reduce tons of zwsp in the generated output, or the
"fo:table-cell/fo:block//text()" condition can be improved to filter
some (text-only?) tables out, but I think that the idea of our specific
line breaking rules could work.

Best regards,
Alexander

Re: Getting our tables to render better in PDF output

Reply via email to