On 23/04/2024 02:01, Ihor Radchenko wrote:
For example, consider an HTML exporter that aligns tags nicely and
keeps blank lines between markup blocks for readability. If we
remove such blank lines unconditionally, it will be problematic.
I consider that just newlines are enough to make HTML markup human
readable. I believe blank lines appear in HTML due to conditional
constructs interpreted by various template engines and almost nobody
cares concerning actual formatting in such cases.
However I proposed to make this feature an option that is turned on by
default.
I guess that I can change the condition to not include trailing space
from (rx whitespace eol) to (rx (any " \t|) eol).
One more time I forgot that neither \n nor non-breakable space are
included into post-blank.
I think, more permissive regexp may be used. At least it should accept
newlines and any space after it
(rx (any " \t" eol) (zero-or-more whitespace) eos)
Moreover, post-blank of the pruned object may be ignored when the
following element starts with spaces other than purely zero width ones.
In my opinion, keeping extra spaces (e.g. post-blank ones from pruned
objects) makes less harm than aggressively stripping them. Anyway some
backends must normalize spaces (while for others they do not matter).
While newline characters are not affected, this part of change does not
affect accidental split of paragraphs.
My feeling is that extensive test suite is required. It would be easier
to review what cases are not handled yet.