Re: Translating manpages into several idioms (gettextization)

Colin Watson Fri, 28 Mar 2025 04:53:41 -0700

On Thu, Mar 27, 2025 at 09:27:48PM -0500, G. Branden Robinson wrote:

At 2025-03-27T01:00:17+0000, Colin Watson wrote:

I still very much don't understand how po4a-translate would work with
this sort of approach.  My understanding is that the only way that you
could take a preprocessed version of the document, feed it into po4a,
and expect to get useful results out of the po4a-translate stage would
be if you could round-trip from your preprocessed form back to
something closely resembling the original document - and
round-tripping entire pages through POD (rather than just the
translatable bits) seems like an unnecessarily hard problem to solve,
and probably not viable for a large corpus.

[snip]


Thanks a lot for your thoughtful response.  I can't rebut most of your
points, in large part because I have, it now seems to me, a deficient
grasp of how po4a is used in the field.

I may have gotten carried away by Martin Quinson's enthusiastic response
to my pitch, thinking I was facing down a tangle of hemp while equipped
with a strong sword arm, a sharp blade, and a hungry eye for Asia.

To be clear, I agree this is a problem well worth solving, so Icertainly don't want to be applying stop-energy to it. Especially if itcould manage to make mdoc pages usefully translatable ...

Is that helpful?  I realize that preserving fragments of the original
markup may not actually be possible with your current implementation
vision,


Yes, that's intractably hard or even computationally impossible (because
irreversible macro interpolations, et al., have already taken
place)--under the strictly confined alternative-node-output scheme I had
in mind.

It could be that the problem is still solvable with a technique similar
to that used for grohtml, combined with how I envision refactoring the
troff/grohtml relationship, and that is by pushing more "tagging" work
into the macro packages themselves.  In this case, man(7) and mdoc(7),
of course.

Another suggestion: I realize that groff's idea of the current linenumber etc. is not always 100% accurate at the moment. Might it betractable to remember enough information while processing macros to fixthat? If groff could emit accurate positional information along witheach chunk of text it emits in this sort of mode, then a po4a modulewould be able to put things back together. (I suppose groff would alsoneed to report the position of the _end_ of the chunk of the inputstream corresponding to each chunk of emitted text.)


Or is that what you're referring to by macro tagging?

Another possibility would be to make groff actually responsible forinjecting translated strings, in sort of the way that Martin waswondering about inhttps://github.com/mquinson/po4a/issues/527#issuecomment-2366953012, byproviding it a .po file or something similar. The main difficulty I canimagine here is that either it would need to know exactly which stringspo4a had produced as msgstrs, or po4a would need to refrain from makingany additional tweaks to msgstrs; you'd also need a way to reversetransformations such as font changes to B<...> and the like. And Isuppose I expected this to be too much scope creep for groff.

It's also possible I'm missing something about po4a! Martin said thatsome formats delegate all the parsing to an external tool, but indigging through po4a I wasn't able to find any examples of this. Havingexamples of the approaches available would be quite helpful.

And I wouldn't be surprised to find other C commands in the grout for
mostly-English prose; what if somebody described an approach as
"naïve", for instance?


_That_, I think, on the other hand, will be relatively rare.  Most
people writing man(7) (or mdoc(7), for that matter), seem to be stumped
by how to input such words, so they do something non-portable or just
give up the attempt, and degrade their input to Basic Latin ("ASCII").

Maybe a more realistic example would be author names, such as one ofthose found in po4a(1p). Sometimes those are just in a bare list ofnames and email addresses, as in that case, and so don't really needtranslation (although I'm not sure how groff would be in a position todetermine that reliably); but sometimes they appear in short sentencescrediting a contributor for something specific.

In this case, po4a(1p) went for just entering the name in question inUTF-8 and not worrying about portability to AT&T troff, and to be honestI expect that to be the default among (ahem) naïve authors. These daysmany people just assume that UTF-8 input will work, and since it mostlydoes work with modern groff, you have to know a certain amount abouttroff history to even know that there's a problem here that you might

be stumped by how to solve.

--
Colin Watson (he/him)                              [cjwat...@debian.org]

Re: Translating manpages into several idioms (gettextization)

Reply via email to