Re: General Questions about Translations and what a package maintainer has to do

Dirk Lehmann Tue, 11 Mar 2025 19:04:10 -0700

Hello Marc,

I am currently new as Debian Maintainer, but I have more experience
from upstream-side.


As I understand your question about a general workflow of translations
in all your email-cases correct, I think the short answer maybe to use
`quilt` to patch the .PO files instead of committing them directly via
Git.  Also consider `gbp-pq`, could be useful, but not sure.

The reason is, that you can merge (aka. apply) your patch-series
(patch-series: something like a branch in Git) at any time during
source-package building.  As I'm new to Debian packaging I'm not sure
if it is possible to use "debian/patches"; but I think no, because the
step is hard-coded during `dpkg-buildpackage` -- and that is not what
we want.

I would recommend to use another quilt directory, like
`debian/patches-intl`.  I don't know if there exist some other
recommended directory in any Debian Policy or Debian Guide.  Then you
can call in `debian/rules` at the time you wish to apply your patch
series

  ```
  $> QUILT_PATCHES=debian/patches-intl quilt push -a
  ```

The manual call in `debian/rules` ensures that you can decide yourself
at which step to apply your patches -- and decide if the
`dpkg-buildpackage` should break with an error on fail-patching, or
not.  If the package maintainer changes, it should be easy to comment
out the one `quilt` command in `debian/rules` and remove the
patch-series `debian/patches-intl` from the package-repository.

To commit your changes to upstream you just need to apply (or ask to
apply) the patch-series into theres repository.  Make sure to have
many small patches, then the chance is greater that some patches are
going to upstream, and maybe others not.

If these patches coming down from upstream `quilt push -a` stops with
a message like:

  ```
  Applying patch debian/patches-intl/<PATCHNAME>.patch
  patching file <FILE.EXT>
  Hunk #1 FAILED at <LINE_NUMBER>.
  1 out of 1 hunk FAILED -- rejects in file <FILE.EXT>
  Patch debian/patches-intl/<PATCHNAME>.patch can be reverse-applied
  ```

Quilt detects with the message 'can be reverse-applied' that it was
already applied by third-party.  Then you can delete the patch in the
patch series with:

  ```
  $> QUILT_PATCHES=debian/patches-intl quilt delete -nr
  ```

followed by completing to apply the patch series with a second `quilt
push -a` -- and repeatly so on, if more than one patches where
accepted by upstream.

Quilt also allows adapting your patches, in contrast to Git, at which
commits are forever or are needed to be reverted with a revert-commit
(see backport work-flows).  That was also an issue you reported.

This workflow should also work if upstream already have translations
committed -- and/or you begin a fresh one.  Maybe, not sure, that it
also solve the issue of the multiple calls of `msgmerge`; as I
understand, `msgmerge` was used as a kind of `patch`.

Hopefully this helps you for progression,
Dirk =)


On 3/11/25 12:03 PM, Marc Haber wrote:

tl;dr: We need more docs about best practices to handle translations as
a package maintainer


My fellow developers,

there are two things in being a DD that I truly despise. The one is
keeping the machine readable debian/copyright up to date, the other is
handling of translations, regardless of whether it's po-debconf, manpage
translations or program translations (when I am also the upstream).

I might rant about debian/copyright when I blow my fuse next time, but
today it's going to be translations. For me, it seems impossible to
support translator's work without putting a significant burden of
additional work to put on oneself. Especially when one uses version
control and does not do all development in Mast^wdebian/latest, dealing
with translations is a nightmare when it comes to merge. As the Newbie¹
DD I am, I keep running into either nightmare merges, or unnecessarily
fuzzed or even destroyed translations, in all cases feeling even more
stupid and incompetent when some translator points out my mistakes. I
have felt being sent back and forth between different workflows (all of
them wrong) by following random advice without being able to find
authoritative explanation.

I might have a fundamantal misunderstandig of procedures, but all
documentation one finds, including Chapter 8 in the Developer's
Reference (which links to a document by Tomohiro KUBOTA which is no
longer there), elaborate on how one would do actual translations, but
doesn't go as far as giving best practice documentation about what a
package maintainer is supposed to do to make translation blend into a
normal packaging workflow without being a nuisance ("put them into the
po directory and build the package" doesn't fit a modern packaging
workflow using version control).

My example package is adduser, but I think that my questions might apply
to other packages as well. adduser has both its program messages and the
manual pages translatable, the latter being done with po4a. I am aware
that there are also translations for debconf templates, but adduser
doesn't have those (any more). I think the problems that show with
debconf template translations are similar to the pain one feels with
documentation and program translations. I actively avoid using debconf
in my packages because I don't want to go through the pain that handling
translations causes, and many parts of Debian consider it bad practice
to use debconf. But that's an entirely different rant.

Handling translations does hurt when the sources are stored in version
control because I constantly end up having changes to pot and po files
in commits where they don't belong, or uncommitted changes that prevent
me, for example, from doing rebases. I have tried to build a workflow
that doesn't hurt me as package maintainer as much, but it has turned
out that this doesn't work because many translators don't care.

Please don't take me wrong, I know this is a rant, and I know that
you'll notice that I am typing this with my fists clenched. But my time
and my nerves and my mental health as as important as it is to be nice
to my translators. I do care, but sometimes it's really hard to maintain
a straight and friendly face while cursing our tools and docs inside.

Whenever I am angry about something in Debian, I start writing docs. So
I try this here, but here I don't know enough to be really helpful. I
hope that this rant will start a positive discussion with actual results
that I could pour into a Wiki page that might actually help with the
pain I am feeling, assuming that many other maintainers feel as well.

Let me try to summarize what I have understood regarding translations
and what my problems are with that.

(1)
When writing software, docs or debconf templates, the respective author
marks certain strings as translateable. There is a number of conventions
to do so which are language dependent. Let's assume that has been done
the right way, there are docs about this.

(2)
There is some point in the development process when it is time to ask
for translations. Translators need a POT file which contains all the
translatable strings, and they make a PO file from that which contains
the actual translation.

(3)
Some program (xgettext for program translations, po4a for manual pages
and some podebconf tool for debconf templates) is used to pull the
translatable strings from the source code and to create a POT file.

xgettext doesn't even try to create a meaningful header and overwrites
whatever one has written into the previous version of the POT file, so a
wrapper is already needed to have a header that translators can fill
(which they usually don't do).

For Adduser's program translations, my call to xgettext is:
xgettext --keyword=mtx --keyword=gtx --omit-header -o "$TEMP_FILE"
--from-code=UTF-8 -L perl adduser deluser $(find . -name "*.pm")².
TEMP_FILE then gets the generated header prepended to result in
adduser.pot.

I have seen this being done in debian/rules' clean target which, in
in-tree builds, causes the POT file to be changed as well and I don't
understand at which step of the packaging process it would be a good
idea to commit that POT file. If I build my package out of tree (like I
do out of tradition of svn-buildpackage, I have gbp configured to use
../build-area), the POT file ends up newly generated in the source
package but never gets updated in git. Adduser had POT files from 2022
in git until just recently because I just never noticed. There is no
lintian check and no check inside tracker.d.o for this.

In other packages, there is a dedicated m4 macro to call xgettext which
doesn't make things easier to understand.

(4)
Then,
msgmerge --update --backup=none --no-fuzzy-matching "${PO_FILE}" "${POT_FILE}"
is called for every existing PO file. This doesn't move the header from
the POT file to the existing PO file so stupidities like "# COPYRIGHT
THE PACKAGE CREATOR" just never get fixed because the translators don't
seem to care.

If a po file for a language already exists during this step, the already
existing translation gets merged into the new po file. In some
circumstances that I have not understood yet, the translation gets
"fuzzied", which I have been told causes a lot of unnecessary and
repeated work for the translators which I am supposed to avoid by doing
manual work myself which I don't understand. Not doing this work is
condemned as "not being nice to translators".

Basically the same applies for this step than for the POT generation
step, with the additional hardship that the PO files are generated,
being written to by a program AND STILL contain a significant part of
human work. I never know how much work of other people I am destroying
by calling msgmerge out of line. In which stage of package build do I do
msgmerge? Do I commit the merged po files, when do I commit them, what
do I do with them during git merge when a feature branch is merged?

(5)
podebconf-report-po is used to generate the calls for translation. One
message is sent to this mailing list with the pot file attached, and for
each existing po file, the translators listed in that file get an
individual mail with just the respective po file attached.

If the msgmerge step is forgotten, they get an already translated po
file that doesn't match the pot and therefore is useless.

In theory, for an already existing package, the POT file is not needed,
right?

(6)
Depending on the age of the existing translations, about half of the
messages I send to individual translators are going to bounce. Am I
supposed to report that to debian-i18n@l.d.o as a followup to the
general translation request so that new translators can take up the
outdated translatorless translations? Or am I supposed to send the
general translation request to debian-i18n last so that I can explicitly
mention the translatorless languages there?

(8)
When a translator does a translation, they send me a new po file
containing the actual translation. If it's a new language, they start
with the pot file that hopefully has the correct header, and if it's an
existing language, they start with the old po file, which almost always
has a historically grown header that is in more or less dire need of
streamlining and cleaning. They either take the PO/POT file from the
e-mail attachment, use a package the pulled from the archive³ or they
pull the PO/POT file from git.

They usually don't bother about the header or copyright, so things like
package name, licenseª, Project-ID-Version and PO-Revision-Date are
often questionable, unclear, just plain wrong or cause extra work to
package maintainer because, for example, a different license was chosen
than the actual package is licensed under either out of incompetence or
not caring.

Am I supposed to fix those headers in the po file myself? Am I supposed
to ask the translator to fix the headers? Or am I supposed to just
ignore all of that and just accept whatever I get sent? I often feel
like a smart-mouthed know-it-all when I ask a translator to improve the
headers of their PO file.

(9)
I then commit the po file the translator sent me to version control

(10)
And then I eventually release the package.

In theory, it would probably be good to do all that regeneration when
preparing a package for release. Why don't we have a debian/rules target
like debian/rules prepare-release that might be useful for that? How
could we protect us against uploading a package with outdated POT/PO
files? People make mistakes.

How am I supposed to handle the unavoidable differences between git
branches, that are probably easier to solve when I am just merging a
feature branch but can be a major pain when merging suite branches like
experimental, stable, unstable where translation work has already been
done?

There must be some smarter method when merging to mas^wdebian/unstable
than (1) move away all po files, (2) merge, (3) ignore all merge
conflicts in po files, (4) regenerate POT, (5) restore po files moved
away in step 1, (6) msgmerge, (7) do a dedicated translation commit
(one? or one per file?).

I have caused enough breakage in adduser in the last weeks and have
wasted enough time of both translators and myself. For the time being, I
am halting all my efforts to be "nice to translators" to avoid breaking
more things and to keep the chaos in my package trees and version
control low. You can read from my git histories pretty well that I don't
know zilch about what I am doing.

This has to stop as long as being nice on translators means multiplying
my own degree of unthanked busy work.

This happens through a crucial part of adduser development since we are
nearing the freeze, but first I need to build knowledge that I should
have built 25 years ago but noone bothered to document. I really don't
know how translations in Debian have come up to THIS point in the
absence of serious docs. Maybe my fellow DDs are smarter than I am with
all tools involved.

Thanks for reading up to this point. Writing this message alone has cost
me three hours of my time that I'd rather have put in productive
packaging work, and a sleepless night. You know, when I blow a fuse, I
rant, and then I start writing docs. I guess when I put the result of
this discussion in a wiki page, it should be under i18n, right? I am
inclined to put on https://wiki.debian.org/I18n a dedicated chapter
titled "for package maintainers", probably between "Keyboard input
infrastructure support" and "Meetings" as this is a matter beyond
interna of the translation teams and the i18n effort. Am I on the
correct track with that?

Ich habe fertigǂ. Thanks in advance,
Marc Haber

¹ I have only been a DD for a bit more than two decades
² adduser has strings that get used in both translated and untranslated
form, making sure that messages written to the console are translated
and messages written to syslog are written in English to make handling
bug repors easier
³ I have received translations that were obviously done against the POT
file from stable.
ª I have received translations that placed the translation under the
same copyright as $SOME_OTHER_PACKAGE.
ǂ https://en.wikipedia.org/wiki/Giovanni_Trapattoni#In_popular_culture

OpenPGP_0xE2A3766F21F02BD5.asc
Description: OpenPGP public key

OpenPGP_signature.asc
Description: OpenPGP digital signature

Re: General Questions about Translations and what a package maintainer has to do

Reply via email to