tl;dr: We need more docs about best practices to handle translations as 
a package maintainer


My fellow developers,

there are two things in being a DD that I truly despise. The one is 
keeping the machine readable debian/copyright up to date, the other is 
handling of translations, regardless of whether it's po-debconf, manpage 
translations or program translations (when I am also the upstream).

I might rant about debian/copyright when I blow my fuse next time, but 
today it's going to be translations. For me, it seems impossible to 
support translator's work without putting a significant burden of 
additional work to put on oneself. Especially when one uses version 
control and does not do all development in Mast^wdebian/latest, dealing 
with translations is a nightmare when it comes to merge. As the Newbie¹ 
DD I am, I keep running into either nightmare merges, or unnecessarily 
fuzzed or even destroyed translations, in all cases feeling even more 
stupid and incompetent when some translator points out my mistakes. I 
have felt being sent back and forth between different workflows (all of 
them wrong) by following random advice without being able to find 
authoritative explanation.

I might have a fundamantal misunderstandig of procedures, but all 
documentation one finds, including Chapter 8 in the Developer's 
Reference (which links to a document by Tomohiro KUBOTA which is no 
longer there), elaborate on how one would do actual translations, but 
doesn't go as far as giving best practice documentation about what a 
package maintainer is supposed to do to make translation blend into a 
normal packaging workflow without being a nuisance ("put them into the 
po directory and build the package" doesn't fit a modern packaging 
workflow using version control).

My example package is adduser, but I think that my questions might apply 
to other packages as well. adduser has both its program messages and the 
manual pages translatable, the latter being done with po4a. I am aware 
that there are also translations for debconf templates, but adduser 
doesn't have those (any more). I think the problems that show with 
debconf template translations are similar to the pain one feels with 
documentation and program translations. I actively avoid using debconf 
in my packages because I don't want to go through the pain that handling 
translations causes, and many parts of Debian consider it bad practice 
to use debconf. But that's an entirely different rant.

Handling translations does hurt when the sources are stored in version 
control because I constantly end up having changes to pot and po files 
in commits where they don't belong, or uncommitted changes that prevent 
me, for example, from doing rebases. I have tried to build a workflow 
that doesn't hurt me as package maintainer as much, but it has turned 
out that this doesn't work because many translators don't care.

Please don't take me wrong, I know this is a rant, and I know that 
you'll notice that I am typing this with my fists clenched. But my time 
and my nerves and my mental health as as important as it is to be nice 
to my translators. I do care, but sometimes it's really hard to maintain 
a straight and friendly face while cursing our tools and docs inside.

Whenever I am angry about something in Debian, I start writing docs. So 
I try this here, but here I don't know enough to be really helpful. I 
hope that this rant will start a positive discussion with actual results 
that I could pour into a Wiki page that might actually help with the 
pain I am feeling, assuming that many other maintainers feel as well.

Let me try to summarize what I have understood regarding translations 
and what my problems are with that.

(1)
When writing software, docs or debconf templates, the respective author 
marks certain strings as translateable. There is a number of conventions 
to do so which are language dependent. Let's assume that has been done
the right way, there are docs about this.

(2)
There is some point in the development process when it is time to ask 
for translations. Translators need a POT file which contains all the 
translatable strings, and they make a PO file from that which contains 
the actual translation.

(3)
Some program (xgettext for program translations, po4a for manual pages 
and some podebconf tool for debconf templates) is used to pull the 
translatable strings from the source code and to create a POT file. 

xgettext doesn't even try to create a meaningful header and overwrites 
whatever one has written into the previous version of the POT file, so a 
wrapper is already needed to have a header that translators can fill 
(which they usually don't do).

For Adduser's program translations, my call to xgettext is:
xgettext --keyword=mtx --keyword=gtx --omit-header -o "$TEMP_FILE" 
--from-code=UTF-8 -L perl adduser deluser $(find . -name "*.pm")².
TEMP_FILE then gets the generated header prepended to result in 
adduser.pot.

I have seen this being done in debian/rules' clean target which, in 
in-tree builds, causes the POT file to be changed as well and I don't 
understand at which step of the packaging process it would be a good 
idea to commit that POT file. If I build my package out of tree (like I 
do out of tradition of svn-buildpackage, I have gbp configured to use 
../build-area), the POT file ends up newly generated in the source 
package but never gets updated in git. Adduser had POT files from 2022 
in git until just recently because I just never noticed. There is no 
lintian check and no check inside tracker.d.o for this.

In other packages, there is a dedicated m4 macro to call xgettext which 
doesn't make things easier to understand.

(4)
Then,
msgmerge --update --backup=none --no-fuzzy-matching "${PO_FILE}" "${POT_FILE}"
is called for every existing PO file. This doesn't move the header from 
the POT file to the existing PO file so stupidities like "# COPYRIGHT 
THE PACKAGE CREATOR" just never get fixed because the translators don't 
seem to care.

If a po file for a language already exists during this step, the already 
existing translation gets merged into the new po file. In some 
circumstances that I have not understood yet, the translation gets 
"fuzzied", which I have been told causes a lot of unnecessary and 
repeated work for the translators which I am supposed to avoid by doing 
manual work myself which I don't understand. Not doing this work is 
condemned as "not being nice to translators".

Basically the same applies for this step than for the POT generation 
step, with the additional hardship that the PO files are generated, 
being written to by a program AND STILL contain a significant part of 
human work. I never know how much work of other people I am destroying 
by calling msgmerge out of line. In which stage of package build do I do 
msgmerge? Do I commit the merged po files, when do I commit them, what 
do I do with them during git merge when a feature branch is merged?

(5)
podebconf-report-po is used to generate the calls for translation. One 
message is sent to this mailing list with the pot file attached, and for 
each existing po file, the translators listed in that file get an 
individual mail with just the respective po file attached.

If the msgmerge step is forgotten, they get an already translated po 
file that doesn't match the pot and therefore is useless.

In theory, for an already existing package, the POT file is not needed, 
right?

(6)
Depending on the age of the existing translations, about half of the 
messages I send to individual translators are going to bounce. Am I 
supposed to report that to debian-i18n@l.d.o as a followup to the 
general translation request so that new translators can take up the 
outdated translatorless translations? Or am I supposed to send the 
general translation request to debian-i18n last so that I can explicitly 
mention the translatorless languages there?

(8)
When a translator does a translation, they send me a new po file 
containing the actual translation. If it's a new language, they start 
with the pot file that hopefully has the correct header, and if it's an 
existing language, they start with the old po file, which almost always 
has a historically grown header that is in more or less dire need of 
streamlining and cleaning. They either take the PO/POT file from the 
e-mail attachment, use a package the pulled from the archive³ or they 
pull the PO/POT file from git.

They usually don't bother about the header or copyright, so things like 
package name, licenseª, Project-ID-Version and PO-Revision-Date are 
often questionable, unclear, just plain wrong or cause extra work to 
package maintainer because, for example, a different license was chosen 
than the actual package is licensed under either out of incompetence or 
not caring.

Am I supposed to fix those headers in the po file myself? Am I supposed 
to ask the translator to fix the headers? Or am I supposed to just 
ignore all of that and just accept whatever I get sent? I often feel 
like a smart-mouthed know-it-all when I ask a translator to improve the 
headers of their PO file.

(9)
I then commit the po file the translator sent me to version control

(10)
And then I eventually release the package.

In theory, it would probably be good to do all that regeneration when 
preparing a package for release. Why don't we have a debian/rules target 
like debian/rules prepare-release that might be useful for that? How 
could we protect us against uploading a package with outdated POT/PO 
files? People make mistakes.

How am I supposed to handle the unavoidable differences between git 
branches, that are probably easier to solve when I am just merging a 
feature branch but can be a major pain when merging suite branches like 
experimental, stable, unstable where translation work has already been 
done?

There must be some smarter method when merging to mas^wdebian/unstable 
than (1) move away all po files, (2) merge, (3) ignore all merge 
conflicts in po files, (4) regenerate POT, (5) restore po files moved 
away in step 1, (6) msgmerge, (7) do a dedicated translation commit 
(one? or one per file?).

I have caused enough breakage in adduser in the last weeks and have 
wasted enough time of both translators and myself. For the time being, I 
am halting all my efforts to be "nice to translators" to avoid breaking 
more things and to keep the chaos in my package trees and version 
control low. You can read from my git histories pretty well that I don't 
know zilch about what I am doing.

This has to stop as long as being nice on translators means multiplying 
my own degree of unthanked busy work.

This happens through a crucial part of adduser development since we are 
nearing the freeze, but first I need to build knowledge that I should 
have built 25 years ago but noone bothered to document. I really don't 
know how translations in Debian have come up to THIS point in the 
absence of serious docs. Maybe my fellow DDs are smarter than I am with 
all tools involved.

Thanks for reading up to this point. Writing this message alone has cost 
me three hours of my time that I'd rather have put in productive 
packaging work, and a sleepless night. You know, when I blow a fuse, I 
rant, and then I start writing docs. I guess when I put the result of 
this discussion in a wiki page, it should be under i18n, right? I am 
inclined to put on https://wiki.debian.org/I18n a dedicated chapter 
titled "for package maintainers", probably between "Keyboard input 
infrastructure support" and "Meetings" as this is a matter beyond 
interna of the translation teams and the i18n effort. Am I on the 
correct track with that?

Ich habe fertigǂ. Thanks in advance,
Marc Haber

¹ I have only been a DD for a bit more than two decades
² adduser has strings that get used in both translated and untranslated 
form, making sure that messages written to the console are translated 
and messages written to syslog are written in English to make handling 
bug repors easier
³ I have received translations that were obviously done against the POT 
file from stable.
ª I have received translations that placed the translation under the 
same copyright as $SOME_OTHER_PACKAGE.
ǂ https://en.wikipedia.org/wiki/Giovanni_Trapattoni#In_popular_culture

Attachment: adduser.pot.gz
Description: Binary data

Reply via email to