Thank you for bringing me into your conversation. To give some background, I am Jean-Christophe Helary, a French professional translator who has been working on pushing free software into professional translation for two decades, mostly by promoting and contributing to OmegaT (https://omegat.org) for which I currently act as project coordinator, but also, on a much lower level, the Okapi Framework, Maxprograms, etc. I also try to work the other way round: by pushing and promoting free professional computer aided translation tools into free software translation processes (with much less success, for practical reasons: most free software translation contributors do that on the side and few are willing to invest time in learning tools that professionals use).
I’ve been the maintainer of the French emacs manuals translation project for a while now, which I’ve been rebooting during Covid. I’m also active in the po4a translation project, etc. > On Feb 29, 2024, at 7:15, Pádraig Brady <p...@draigbrady.com> wrote: > I see emacs recently discussed translating their texinfo manuals at: > https://lists.gnu.org/archive/html/help-texinfo/2024-01/msg00057.html > As it stands I only see one file for one language at: > https://git.savannah.gnu.org/cgit/emacs.git/tree/doc/translations The point here is having humans contributing to a project. You have not failed to notice that Emacs is used around the world even without having translations of being localized (beyond its tutorial). Also, there is only one file for a French manual there, not because there are no existing translations of the manuals, but rather because it is precisely that French manual that triggered the discussion of “how do we handle translations and how do we publish them?” which was kind of the roadblock here. There are or have been translation efforts in at least French, Japanese and Chinese and I have no doubt that once we have modified the build process to install the various manuals, we’ll have more projects slowly organizing. The Emacs manuals are about 2 million words. Five hundred words a day with a team of 10 committed people is about a year of work. Which is nothing. Make it 20 people and you have that in 6 months. Doing that is more than providing a translation, it is increasing people’s skills and understanding of complex processes. It is creating a community of people who understand issues of free software and are not mere consumers. In one of the threads to which the discussion that you quoted belonged, somebody noted that LibreOffice has a 6 million words manual that is translated in a dozen languages. Of course, that’s because Star Office started early, then Sun Microsystem pushed the effort further by spending money on LSPs (already including some kind of reviewed machine translation), etc., and then LibreOffice took over the existing volunteer teams and they now have highly experienced people who handle that. And because translation/localization is a very low friction entry point into contributing to the community, that actually generates code contributions (my own “main” contribution to Emacs is a fix to packages.el because its output was full of single/plural errors for corner cases). > Now stepping back a bit, perhaps at this stage rather than > persisting specific translations of specific snapshots of the docs, > perhaps we should leverage the increasingly sophisticated translations > provided by LLMs, to provide more up to date and varied translations. This is a process/promotion/human issue. If the FSF decided to invest money on translation management and promotion, I am sure lots of issues that we have would go away. Also, I’m sure that the GNU project would object to massively use LLM outputs, considering that current LLMs are basically huge copyright infringers and that they are just playing on the enormity of what they did to get a free international pass. LLMs do not create communities. They feed on communities and they are not accountable for the huge externalities that they produce. LLM output costs "nothing". Which means that individual users already have access to that. In fact, I argued exactly that to the Linux Foundation JA office yesterday. Providing LLM based translation is not doing a service to users. It is also dangerous because LLM output is strangely false in weird and unexpected places, and besides for a human review service that I doubt the Gnu project would be willing to provide, there is nothing that would keep those errors to be spread in the wild, at a real cost that you can’t imagine. LLMs do *not* provide “more up to date and varied translations”. They provide “probable strings that they do not understand, but it looks human enough that a human can be tricked into thinking that a human who understands the subject matter actually wrote that”. It would be nice to put down on paper what LLMs actually stand for and discuss that before suggesting their wholesale use in the GNU project. > It would be cool to integrate that seamlessly into the GNU info reader > and/or online versions of the manual. Based on which not-copyright infringing LLM? > For illustration, ChatGPT gave this for the start of the ls manual: I don't read Chinese but I can reverse translate that with some LLM system. I guess it is good enough to understand what ls is about. - Now do that to the whole page and send that to a professional Chinese native computer user for comments. - While you're at it, if you consider that the LLM output is good enough for Chinese users, try to reverse translate that to English and ask yourself if you'd find that acceptable in an official GNU manual. - And also, why not use LLMs to actually produce manuals in English? Would you support that? What was the effort required for you to produce that output? What additional benefit would such a “service” provide to users who already have access to such services for free? I think those are valid questions. If LLMs came at such a high cost that only institutions could access their output, it would make some sense in some cases to provide such a service. Also, LLM providers, and especially the one behind ChatGPT, are currently engaged in a global environmental destruction project at a time when we need to stop burning fossil fuel. So let’s please not promote their use in places such as the GNU Project which has diametrically opposed objectives in terms of human liberation. -- Jean-Christophe Helary @jchel...@emacs.ch https://sr.ht/~brandelune/