Hi Marc,

Here are some of my thoughts based on my (relatively little) experience
with both translating to Arabic and receiving translations.

It might also be worthwhile to forward your message to
debian-i18n@l.d.o, since translators and i18n people are more likely to
be subscribed there and less likely to be subscribed here.

On Tue, 2025-03-11 at 12:03 +0100, Marc Haber wrote:
> (3)
> Some program (xgettext for program translations, po4a for manual pages 
> and some podebconf tool for debconf templates) is used to pull the 
> translatable strings from the source code and to create a POT file. 
> 
> xgettext doesn't even try to create a meaningful header and overwrites 
> whatever one has written into the previous version of the POT file, so a 
> wrapper is already needed to have a header that translators can fill 
> (which they usually don't do).
> 
> For Adduser's program translations, my call to xgettext is:
> xgettext --keyword=mtx --keyword=gtx --omit-header -o "$TEMP_FILE" 
> --from-code=UTF-8 -L perl adduser deluser $(find . -name "*.pm")².
> TEMP_FILE then gets the generated header prepended to result in 
> adduser.pot.

xgettext comes with a ton of options to help you. Have a look at the
diff I've attached for what I've been able to do.

Note that you shouldn't define the plural stuff in the POT file, that's
something that's set on a per-language basis. There should also be a
newline between the header and first message block.

> I have seen this being done in debian/rules' clean target which, in 
> in-tree builds, causes the POT file to be changed as well and I don't 
> understand at which step of the packaging process it would be a good 
> idea to commit that POT file. If I build my package out of tree (like I 
> do out of tradition of svn-buildpackage, I have gbp configured to use 
> ../build-area), the POT file ends up newly generated in the source 
> package but never gets updated in git. Adduser had POT files from 2022 
> in git until just recently because I just never noticed. There is no 
> lintian check and no check inside tracker.d.o for this.
> 
> In other packages, there is a dedicated m4 macro to call xgettext which 
> doesn't make things easier to understand.

Usually, all this stuff with generating and updating POT & PO files is
upstream's responsibility to deal with, hence why you'll find little
documentation for translating anything other than debconf templates. Since this
is a native package, it's up to you to do what you want. My suggestion is to
run this script before release; the most important thing is that it is run
after the program's messages are updated and _finalised_, and before sending it
to translators.

> (4)
> Then,
> msgmerge --update --backup=none --no-fuzzy-matching "${PO_FILE}" "${POT_FILE}"
> is called for every existing PO file. This doesn't move the header from 
> the POT file to the existing PO file so stupidities like "# COPYRIGHT 
> THE PACKAGE CREATOR" just never get fixed because the translators don't 
> seem to care.

The header is only touched when the translation is initially created.

> If a po file for a language already exists during this step, the already 
> existing translation gets merged into the new po file. In some 
> circumstances that I have not understood yet, the translation gets 
> "fuzzied",

A "fuzzy" translation is when the source string has been altered but not
entirely removed, meaning that the translated string needs to be
rechecked. This often occurs when there's a grammatical change or some
rewording in the source string. 

> which I have been told causes a lot of unnecessary and 
> repeated work for the translators which I am supposed to avoid by doing 
> manual work myself which I don't understand. Not doing this work is 
> condemned as "not being nice to translators".

Nothing is wrong with making translations fuzzy, sometimes it's
necessary.

I've rarely seen anyone being condemned for fuzzy translations (but then
again, I work in a language team that has virtually no members). The
only reason this would happen is if a maintainer kept making pointless
changes to the source strings to the point that translators are fed up
of reviewing the same strings over and over again, when they could be
creating new translations.

> Basically the same applies for this step than for the POT generation 
> step, with the additional hardship that the PO files are generated, 
> being written to by a program AND STILL contain a significant part of 
> human work. I never know how much work of other people I am destroying 
> by calling msgmerge out of line.

Just make sure that you're happy with the state of messages in the
program, since removing any source strings deletes them from the
translations as well. Since Git is being used, this is theoretically
reversible, but really should be avoided in the first place.

> In which stage of package build do I do 
> msgmerge? Do I commit the merged po files, when do I commit them, what 
> do I do with them during git merge when a feature branch is merged?

In your position, I would leave the translations and the POT file
untouched on the feature branch, and only ever update them on the main
branch after merging.

> (5)
> podebconf-report-po is used to generate the calls for translation. One 
> message is sent to this mailing list with the pot file attached, and for 
> each existing po file, the translators listed in that file get an 
> individual mail with just the respective po file attached.
> 
> If the msgmerge step is forgotten, they get an already translated po 
> file that doesn't match the pot and therefore is useless.
> 
> In theory, for an already existing package, the POT file is not needed, 
> right?

Yes, in theory, but it's still helpful to attach it for a variety of
reasons e.g. the existing translation is an old garbled mess and
starting new is the best option.

> (6)
> Depending on the age of the existing translations, about half of the 
> messages I send to individual translators are going to bounce. Am I 
> supposed to report that to debian-i18n@l.d.o as a followup to the 
> general translation request so that new translators can take up the 
> outdated translatorless translations? Or am I supposed to send the 
> general translation request to debian-i18n last so that I can explicitly 
> mention the translatorless languages there?

According to the manual page (and from what I've seen on
debian-l10n-ar@l.d.o), the language team listed in the PO file (which
should _always_ be debian-l10n-LANG@l.d.o) is Cc'ed by default, so they
will deal with inactive translators. Anyone working on translations
should be subscribed to the list for the relevant language, so it'll be
picked up. You don't need to do anything extra.

> (8)
> When a translator does a translation, they send me a new po file 
> containing the actual translation. If it's a new language, they start 
> with the pot file that hopefully has the correct header, and if it's an 
> existing language, they start with the old po file, which almost always 
> has a historically grown header that is in more or less dire need of 
> streamlining and cleaning. They either take the PO/POT file from the 
> e-mail attachment, use a package the pulled from the archive³ or they 
> pull the PO/POT file from git.

They really shouldn't be taking it from the archive. If you're not happy
with the version of the POT they used, you can attach the correct POT
file and ask them to fix their translations. You're not obligated to use
the file they supply.

> They usually don't bother about the header or copyright, so things like 
> package name, licenseª, Project-ID-Version and PO-Revision-Date are 
> often questionable, unclear, just plain wrong or cause extra work to 
> package maintainer because, for example, a different license was chosen 
> than the actual package is licensed under either out of incompetence or 
> not caring.
> 
> Am I supposed to fix those headers in the po file myself? Am I supposed 
> to ask the translator to fix the headers? Or am I supposed to just 
> ignore all of that and just accept whatever I get sent? I often feel 
> like a smart-mouthed know-it-all when I ask a translator to improve the 
> headers of their PO file.

What you do is up to you. Translation headers are annoyingly
inconsistent from a QA perspective, so don't feel bad for asking
translators to fix headers (or fix them yourself if it's easy enough).
I've had to do this for when I recieved translations for miniflux's
debconf templates.

> (9)
> I then commit the po file the translator sent me to version control
> 
> (10)
> And then I eventually release the package.
> 
> In theory, it would probably be good to do all that regeneration when 
> preparing a package for release. Why don't we have a debian/rules target 
> like debian/rules prepare-release that might be useful for that? How 
> could we protect us against uploading a package with outdated POT/PO 
> files? People make mistakes.

I've attached a rough check script in the same diff that tells you if
you're POT is outdated, based on exit code. You could use this in a
pre-commit hook or somewhere in d/rules to fail the build (like
execute_before_dh_auto_configure).

> How am I supposed to handle the unavoidable differences between git 
> branches, that are probably easier to solve when I am just merging a 
> feature branch but can be a major pain when merging suite branches like 
> experimental, stable, unstable where translation work has already been 
> done?

Wouldn't an "theirs" merge strategy for only the translations work?
During a merge conflict, you can use `git checkout --theirs -- po/`, or
use .gitattributes with `po/* merge=theirs` (I haven't tried this).
This way new translations from, for instance, debian/experimental, will
replace the old ones in debian/unstable.

But honestly, if some changes are in debian/experimental or in a feature
branch, the translations should really be left alone, since noone will
use it anyway and the package is prone to further changes.

> There must be some smarter method when merging to mas^wdebian/unstable 
> than (1) move away all po files, (2) merge, (3) ignore all merge 
> conflicts in po files, (4) regenerate POT, (5) restore po files moved 
> away in step 1, (6) msgmerge, (7) do a dedicated translation commit 
> (one? or one per file?).
[...]
> Thanks for reading up to this point. Writing this message alone has cost 
> me three hours of my time that I'd rather have put in productive 
> packaging work, and a sleepless night. You know, when I blow a fuse, I 
> rant, and then I start writing docs. I guess when I put the result of 
> this discussion in a wiki page, it should be under i18n, right? I am 
> inclined to put on https://wiki.debian.org/I18n a dedicated chapter 
> titled "for package maintainers", probably between "Keyboard input 
> infrastructure support" and "Meetings" as this is a matter beyond 
> interna of the translation teams and the i18n effort. Am I on the 
> correct track with that?

IMO this information should probably go on i18n.d.o.

If you don't want to deal with translation stuff, I'm happy to help with
that aspect, and if you'd like you can offload that on me.

--
Maytham
diff --git a/check_pot.sh b/check_pot.sh
new file mode 100755
index 0000000..fa3756c
--- /dev/null
+++ b/check_pot.sh
@@ -0,0 +1,11 @@
+#!/bin/bash
+
+current="$(mktemp)"
+updated="$(mktemp)"
+
+cat po/adduser.pot | perl -0777 -pe "s/(.*?)\n\n//s" >"$current"
+
+GENERATE_PO="0" POT_FILE="$updated" ./generate_pot.sh >/dev/null 2>&1
+cat "$updated" | perl -0777 -pe "s/(.*?)\n\n//s" | tee "$updated" >/dev/null
+
+diff -q "$current" "$updated"
diff --git a/generate_pot.sh b/generate_pot.sh
index 8f7df47..98cf4d6 100755
--- a/generate_pot.sh
+++ b/generate_pot.sh
@@ -15,44 +15,24 @@
 # when they have been touched by a translator.
 
 # Define file names
-HEADER_FILE="po/header.pot"
-TEMP_FILE="po/temp.pot"
-POT_FILE="po/adduser.pot"
+POT_FILE=${POT_FILE:-"po/adduser.pot"}
 SOURCE_FILES="adduser deluser *.pm"
 
-# Create a custom header file if it doesn't exist
-if [ ! -f "${HEADER_FILE}" ]; then
-    cat <<EOL > "${HEADER_FILE}"
-# adduser program translation
-# Copyright (C) (insert year and author here)
-# This file is distributed under the same license as the adduser package.
-# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
-#
-msgid ""
-msgstr ""
-"Project-Id-Version: adduser $(dpkg-parsechangelog --show-field Version)\n"
-"Report-Msgid-Bugs-To: addu...@packages.debian.org\n"
-"POT-Creation-Date: $(date +"%Y-%m-%d %H:%M%z")\n"
-"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
-"Last-Translator: Your Name <your.em...@example.com>\n"
-"Language-Team: MyProject Team <t...@example.com>\n"
-"Content-Type: text/plain; charset=UTF-8\n"
-"Content-Transfer-Encoding: 8bit\n"
-"Plural-Forms: nplurals=2; plural=(n != 1);\n"
-EOL
-fi
-
-# Extract strings without generating a default header
-xgettext --keyword=mtx --keyword=gtx --omit-header -o "${TEMP_FILE}" --from-code=UTF-8 -L ${SOURCE_FILES}
-
-# Merge custom header and extracted strings
-cat "${HEADER_FILE}" "${TEMP_FILE}" > "${POT_FILE}"
+# Extract strings and generate POT file
+xgettext \
+    --keyword=mtx --keyword=gtx --from-code=UTF-8 -L perl \
+    --package-name=adduser --package-version=$(dpkg-parsechangelog --show-field Version) --msgid-bugs-address=addu...@packages.debian.org \
+    -o "${POT_FILE}" ${SOURCE_FILES}
+# Replace the first line with an actually useful description template
+sed -i "1s/.*/# Translation of adduser program into LANGUAGE/" "${POT_FILE}"
 
 # Clean up temporary file
 rm "${TEMP_FILE}" "${HEADER_FILE}"
 
 echo "POT file generated: ${POT_FILE}"
 
+[ "$GENERATE_PO" == 0 ] && exit;
+
 # Loop through all .po files in the locale directory
 for PO_FILE in po/*.po; do
     if [ -f "${PO_FILE}" ]; then
-- 
2.48.0.rc1.219.gb6b6757d772

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to