On Thu, Mar 27, 2025 at 09:27:48PM -0500, G. Branden Robinson wrote:
At 2025-03-27T01:00:17+0000, Colin Watson wrote:
I still very much don't understand how po4a-translate would work with
this sort of approach.  My understanding is that the only way that you
could take a preprocessed version of the document, feed it into po4a,
and expect to get useful results out of the po4a-translate stage would
be if you could round-trip from your preprocessed form back to
something closely resembling the original document - and
round-tripping entire pages through POD (rather than just the
translatable bits) seems like an unnecessarily hard problem to solve,
and probably not viable for a large corpus.
[snip]

Thanks a lot for your thoughtful response.  I can't rebut most of your
points, in large part because I have, it now seems to me, a deficient
grasp of how po4a is used in the field.

I may have gotten carried away by Martin Quinson's enthusiastic response
to my pitch, thinking I was facing down a tangle of hemp while equipped
with a strong sword arm, a sharp blade, and a hungry eye for Asia.

To be clear, I agree this is a problem well worth solving, so I certainly don't want to be applying stop-energy to it. Especially if it could manage to make mdoc pages usefully translatable ...

Is that helpful?  I realize that preserving fragments of the original
markup may not actually be possible with your current implementation
vision,

Yes, that's intractably hard or even computationally impossible (because
irreversible macro interpolations, et al., have already taken
place)--under the strictly confined alternative-node-output scheme I had
in mind.

It could be that the problem is still solvable with a technique similar
to that used for grohtml, combined with how I envision refactoring the
troff/grohtml relationship, and that is by pushing more "tagging" work
into the macro packages themselves.  In this case, man(7) and mdoc(7),
of course.

Another suggestion: I realize that groff's idea of the current line number etc. is not always 100% accurate at the moment. Might it be tractable to remember enough information while processing macros to fix that? If groff could emit accurate positional information along with each chunk of text it emits in this sort of mode, then a po4a module would be able to put things back together. (I suppose groff would also need to report the position of the _end_ of the chunk of the input stream corresponding to each chunk of emitted text.)

Or is that what you're referring to by macro tagging?

Another possibility would be to make groff actually responsible for injecting translated strings, in sort of the way that Martin was wondering about in https://github.com/mquinson/po4a/issues/527#issuecomment-2366953012, by providing it a .po file or something similar. The main difficulty I can imagine here is that either it would need to know exactly which strings po4a had produced as msgstrs, or po4a would need to refrain from making any additional tweaks to msgstrs; you'd also need a way to reverse transformations such as font changes to B<...> and the like. And I suppose I expected this to be too much scope creep for groff.

It's also possible I'm missing something about po4a! Martin said that some formats delegate all the parsing to an external tool, but in digging through po4a I wasn't able to find any examples of this. Having examples of the approaches available would be quite helpful.

And I wouldn't be surprised to find other C commands in the grout for
mostly-English prose; what if somebody described an approach as
"naïve", for instance?

_That_, I think, on the other hand, will be relatively rare.  Most
people writing man(7) (or mdoc(7), for that matter), seem to be stumped
by how to input such words, so they do something non-portable or just
give up the attempt, and degrade their input to Basic Latin ("ASCII").

Maybe a more realistic example would be author names, such as one of those found in po4a(1p). Sometimes those are just in a bare list of names and email addresses, as in that case, and so don't really need translation (although I'm not sure how groff would be in a position to determine that reliably); but sometimes they appear in short sentences crediting a contributor for something specific.

In this case, po4a(1p) went for just entering the name in question in UTF-8 and not worrying about portability to AT&T troff, and to be honest I expect that to be the default among (ahem) naïve authors. These days many people just assume that UTF-8 input will work, and since it mostly does work with modern groff, you have to know a certain amount about troff history to even know that there's a problem here that you might
be stumped by how to solve.

--
Colin Watson (he/him)                              [cjwat...@debian.org]

Reply via email to