Re: On Translation Issues

Jean-Christophe Helary Wed, 28 Feb 2024 17:52:51 -0800

Thank you for bringing me into your conversation.

To give some background, I am Jean-Christophe Helary, a French 
professional translator who has been working on pushing free software 
into professional translation for two decades, mostly by promoting and 
contributing to OmegaT (https://omegat.org) for which I currently act 
as project coordinator, but also, on a much lower level, the Okapi 
Framework, Maxprograms, etc. I also try to work the other way round: by 
pushing and promoting free professional computer aided translation 
tools into free software translation processes (with much less success, 
for practical reasons: most free software translation contributors do 
that on the side and few are willing to invest time in learning tools 
that professionals use).

I’ve been the maintainer of the French emacs manuals translation 
project for a while now, which I’ve been rebooting during Covid. I’m 
also active in the po4a translation project, etc.

> On Feb 29, 2024, at 7:15, Pádraig Brady <[email protected]> wrote:

> I see emacs recently discussed translating their texinfo manuals at:
> https://lists.gnu.org/archive/html/help-texinfo/2024-01/msg00057.html
> As it stands I only see one file for one language at:
> https://git.savannah.gnu.org/cgit/emacs.git/tree/doc/translations

The point here is having humans contributing to a project. You have not 
failed to notice that Emacs is used around the world even without 
having translations of being localized (beyond its tutorial).

Also, there is only one file for a French manual there, not because 
there are no existing translations of the manuals, but rather because it 
is precisely that French manual that triggered the discussion of “how 
do we handle translations and how do we publish them?” which was kind 
of the roadblock here. There are or have been translation efforts in at 
least French, Japanese and Chinese and I have no doubt that once we 
have modified the build process to install the various manuals, we’ll 
have more projects slowly organizing.

The Emacs manuals are about 2 million words. Five hundred words a day 
with a team of 10 committed people is about a year of work. Which is 
nothing. Make it 20 people and you have that in 6 months. Doing that is 
more than providing a translation, it is increasing people’s skills and 
understanding of complex processes. It is creating a community of 
people who understand issues of free software and are not mere consumers.

In one of the threads to which the discussion that you quoted belonged, 
somebody noted that LibreOffice has a 6 million words manual that is 
translated in a dozen languages. Of course, that’s because Star Office 
started early, then Sun Microsystem pushed the effort further by 
spending money on LSPs (already including some kind of reviewed machine 
translation), etc., and then LibreOffice took over the existing 
volunteer teams and they now have highly experienced people who handle 
that. And because translation/localization is a very low friction entry 
point into contributing to the community, that actually generates code 
contributions (my own “main” contribution to Emacs is a fix to 
packages.el because its output was full of single/plural errors for 
corner cases).

> Now stepping back a bit, perhaps at this stage rather than
> persisting specific translations of specific snapshots of the docs,
> perhaps we should leverage the increasingly sophisticated translations
> provided by LLMs, to provide more up to date and varied translations.

This is a process/promotion/human issue. If the FSF decided to invest 
money on translation management and promotion, I am sure lots of issues 
that we have would go away.

Also, I’m sure that the GNU project would object to massively use LLM 
outputs, considering that current LLMs are basically huge copyright 
infringers and that they are just playing on the enormity of what they 
did to get a free international pass. LLMs do not create communities. 
They feed on communities and they are not accountable for the huge 
externalities that they produce.

LLM output costs "nothing". Which means that individual users 
already have access to that. In fact, I argued exactly that to the Linux 
Foundation JA office yesterday. Providing LLM based translation is not 
doing a service to users. It is also dangerous because LLM output is 
strangely false in weird and unexpected places, and besides for a human 
review service that I doubt the Gnu project would be willing to provide, 
there is nothing that would keep those errors to be spread in the wild, 
at a real cost that you can’t imagine.

LLMs do *not* provide “more up to date and varied translations”. They 
provide “probable strings that they do not understand, but it looks 
human enough that a human can be tricked into thinking that a human who 
understands the subject matter actually wrote that”.

It would be nice to put down on paper what LLMs actually stand for and 
discuss that before suggesting their wholesale use in the GNU project.

> It would be cool to integrate that seamlessly into the GNU info reader
> and/or online versions of the manual.

Based on which not-copyright infringing LLM?

> For illustration, ChatGPT gave this for the start of the ls manual:

I don't read Chinese but I can reverse translate that with some LLM 
system. I guess it is good enough to understand what ls is about.

- Now do that to the whole page and send that to a professional Chinese 
native computer user for comments.

- While you're at it, if you consider that the LLM output is good enough 
for Chinese users, try to reverse translate that to English and ask 
yourself if you'd find that acceptable in an official GNU manual.

- And also, why not use LLMs to actually produce manuals in English? 
Would you support that?

What was the effort required for you to produce that output? What 
additional benefit would such a “service” provide to users who already 
have access to such services for free? I think those are valid questions.

If LLMs came at such a high cost that only institutions could access 
their output, it would make some sense in some cases to provide such a 
service. Also, LLM providers, and especially the one behind ChatGPT, 
are currently engaged in a global environmental destruction project at 
a time when we need to stop burning fossil fuel. So let’s please not 
promote their use in places such as the GNU Project which has 
diametrically opposed objectives in terms of human liberation.

-- 
Jean-Christophe Helary
@[email protected]
https://sr.ht/~brandelune/

Re: On Translation Issues

Reply via email to