To: OLPC Localization list
Cc: Deb18n and Translate Toolkit lists, hoping to pick their brains

Background: the XO laptop software is available in Vietnamese. An OLPC project 
in Vietnam asked me for translated manuals.

Chris, thankyou for your detailed reply. My comments are interleaved below.

On 26/03/2010, at 2:57 AM, Chris Leonard wrote:

> On Wed, Mar 24, 2010 at 2:56 AM, Clytie Siddall <cly...@riverland.net.au> 
> wrote:
> 
> 1. Has anyone translated any part of these [OLPC] manuals into Vietnamese?
> 
> 2. If not, can we get this text on Pootle (e.g. using po4a [1] which converts 
> many doc formats into PO files)?
> 
> So, Clytie, this is a more expansive answer to your question about localizing 
> the manuals into Vietnamese and the tools available to do so.
> 
> For long form texts (like manuals and web-sites) it isn't quite as simple as 
> just scraping the translatable strings with a tool like po4a and posting the 
> PO file on Pootle.  The eco-system of tools is simply not as complete or 
> mature as it is for code internationalization (i18n) and localization (L10n) 
> and perhaps even more importantly, the familiarity and experience with 
> support of po4a doesn't exist within the OLPC community at present.

Taking on any new process is definitely a barrier, and doc-l10n is definitely a 
complex task.
> 
> For code, there is a fairly mature eco-system of tools that make a 
> coordinated / distributed publication process simpler for localizers and 
> developers alike.  Tools like gettext assist in the first phase of i18n 
> (preparing PO files) and there are also methods for connecting Pootle up to 
> software repositories (like git) that can help keep the repo and the PO files 
> concurrent and synchronized.  After L10n is completed, there are further 
> mechanisms for committing the completed PO file back to the repository where 
> additional i18n steps occur in the build-creation phase of release 
> publication (e.g. generation of MO files and language packs). 
> 
> Even so, it takes a fair amount of manual intervention behind the scenes to 
> make all of this work.  By-and-large, Sayamindu carries most of that burden 
> by himself, which is why I've chosen to work on various administrivia aspects 
> of the Sugar Labs / OLPC localization effort, so that he can focus on the 
> absolutely critical stuff that he is uniquely qualified to do.  We are 
> fortunate that for the most part our developers are committed to doing their 
> part with respect to i18n, particularly the core Sugar developers and a good 
> handful of Activity developers, but there is still room for improvement in 
> terms of a number of individually contributed Actvities and getting them set 
> up for inclusion in Pootle. This is mostly an ongoing education challenge and 
> not a technical issue.

Indeed. Even long-established i18n projects are still evolving at this level. 
The whole distributed nature of FLOSS, and its voluntary contributions, can 
make it difficult to establish standards and change practice. However, a few 
enthusiasts can achieve a great deal. Sayamindu is a valiant example: without 
him investing the time to learn about Pootle, set it up and support it, many 
OLPC localizations would not exist. I wouldn't even know about OLPC: I only 
found out about it via the Pootle list, IIRC. The leading FLOSS i18n projects 
owe much of their success and virtually all of their progress to a few 
dedicated volunteer coordinators. So we can do it, but it takes time and effort 
to build a workable and sustainable process.
> 
> A far as long-form HTML-based text there is something of a gap in the 
> maturity of the tools, particularly with respect to the interfaces between 
> the i18n and L10n (and then back to i18n) process, and while it is true that 
> po4a is an attempt to address this, it is simply not at the same 
> "plug-n-play" level as a toolset as what is currently available for code 
> L10n.  No insult meant to the developers of po4a (I applaud and appreciate 
> their work), but they themselves refer to these issues in the rather 
> extensive manpage below.  Reading it gives you some flavor for the 
> complexities involved.
> 
> http://po4a.alioth.debian.org/man/man7/po4a.7.php
> 
> It may seem a little unfair, but by the "eat your own cooking " standard, if 
> you look at the repo for po4a, they only have their own documentation 
> localized into a small handful of European languages, which must say 
> something about the tool itself or at least the overall context in which it 
> currently exists.

As a Debian translator, I can tell you I haven't translated that manpage 
because I haven't had the time. It's a matter of priorities for what resources 
you have available. I'd be interested to hear what the larger language teams 
have to say. Odds are a briefer manpage would get translated first.

>   Their L10n process still involves many manual steps that play themselves 
> out on their e-mail lists, and this is not a truly scalable solution.

I agree with you entirely on HTML and websites in general. They are a pain to 
translate. It's very difficult to keep up with changes. I don't really know any 
project which handles website-translation effectively, although Debian probably 
comes closest: the pages are accessible via source control, there is good 
documentation on what to do, you have an RSS feed and diff for updates, but 
you're still working with wml files and no segmentation or effective metadata. 
I tend to avoid webpages for this reason. Nearly all web/wiki pages I've 
translated initially have languished without update due to the difficulty of 
following the pages up, compared to normal PO/XLIFF update. Of the few web/wiki 
pages I currently maintain:

1. Scratch website [1]
The files are on their Pootle. I have no idea what l10n process they go 
through, but I translated this website simply because it was promoting free 
software and available in PO format, so I could use my offline editor or 
Pootle. Time will show if it gets updated regularly. I have yet to see my 
translation on the website. The quality of the original strings was appalling, 
showing that review is an essential part of the L10n process.

2. TuxPaint website [2]
These files were available on the Locamotion Pootle, during the Decathlon 
project. I could use my existing workflow. We are currently setting up an 
ongoing process to update the files and continue using Pootle (I volunteered to 
support this project). Wordpress is also available on the Locamotion Pootle, so 
I translated that as well (hopefully someone will volunteer to support it).

3. Creative Commons, Frugalware, GNU/Linux Matters – all available on their 
individual Pootles.

Manuals and other docs are more accessible in some projects.

1. GNOME makes all manuals/docs available in PO format right next to the 
application files they support (e.g. [3]) in their translation interface, and 
can also be updated and committed via git. Even though we haven't had the 
resources to do many of the docs yet, we have done the licences, for example, 
and the fact that the docs are available in PO format, fully as easy to grab 
and as up-to-date as the application files, means they are more accessible to 
us than other docs. They will get translated simply because they are there and 
we can use an effective editing and update process.

2. KDE does the same as GNOME (initially inventing the stats/access interface, 
but now GNOME have invented their "Damned Lies" interface which I find 
superior). It works well for them. We didn't have the resources to start on the 
docs, but we will when we do.

3. Debian does get more results. AFAI can tell, all the manuals, manpages, 
release announcements, reference cards etc. (that's a truly enormous mass of 
documentation) are available in PO format via SVN. There are status webpages, 
reminders on the list, build logs, almost-live web display, plus I get 
RSS/email reminders when they need updating, and can just pull the updated file 
from source control, then commit my changes. While source control access can be 
a barrier to added participation, it works well for translators who are 
accustomed to it. I maintain more docs at Debian than anywhere else, simply 
because they are available, always updated, and the Deb18n project itself is 
(in my experience) the leading internationalization project in FLOSS. It 
reviews original strings, uses more recent gettext tools (e.g. msgid-previous, 
so you can compare the previous original string to the changed one) and invents 
its own tools, follows up on translation implementation, is extremely 
well-executed and innovative, and it is generally a pleasure and a satisfaction 
to volunteer for Deb18n. Quality is their highest priority. So, despite the 
initial load of setting up the po4a etc. process for their docs, the results 
have IMNSHO been worth it. However, economy of scale comes into play with 
projects as large as Debian. Create po-debconf or po4a, put in all the work to 
implement it, and the output grows. This may not apply as well to a small 
project, although in my experience the investment in tools and lowering access 
barriers for translators is generally a good return.

4. Pootle's docs are available on the Locamotion Pootle, and are 
fully-translated and updated in Vietnamese. The wiki pages are also going to be 
on Pootle, but meanwhile, I haven't been able to keep up with the updates to 
those pages. I find the contrast telling in this context, since the wiki 
information is often of higher (and more topical) priority. I would certainly 
work on those pages if I had a viable translate/update process, with string 
segmentation.

5. I also maintain occasional docs where the developer sends me update emails. 
These docs get lower priorities, however, because I have to work with a page of 
text and a manual diff, rather than a translation format. It's simply more 
time-consuming and more liable to create errors. Also, it's more difficult to 
maintain translation memory.

So, Pootle and PO/XLIFF conversions have worked for me with docs. Manual 
maintenance of pages of text has not. I recognize the amount of work necessary 
to set up the level of support we translators need.
> 
> Importantly, a significant aspect of the maturity gap has to do with user 
> training as much as the tools themselves.  As a rule, FOSS software 
> developers have accepted the importance of localization and are easily 
> induced to do their part on i18n

Ha! I can feel many i18n project coordinators rolling over in their graves (the 
ones who died of frustration, that is). Easily? Try "after a great deal of 
blood, sweat and tears". ;)

> but authors (both web-page developers and part-time documentation writers) 
> have not yet been as well-indoctrinated by the localization community and so 
> there is often a disconnect between the format of the document as produced by 
> the authors as well as the tools they use to publish it (e.g. their 
> site-hosting or wiki for instance) and the processes involved in i18n-L10n.  
> Mostly this means that there is much more manual intervention needed because 
> the automation of these tools for a highly distributed environment like Sugar 
> Labs / OLPC has not reached the same level of sophistication on the back-end, 
> in order to provide the simplest interface for developers and localizers on 
> the front-end.

Actually, many application developers neglect their own documentation. Either 
they aren't comfortable with explanatory writing and user support, or they 
simply give it lower priority due to ignorance ("Everyone knows how to do X") 
or lack of time. As with translators, the app strings come first. So docs need 
better support in general.

I agree, however, that many people creating documents and not apps. may not be 
accustomed to localization as an essential step in the process. And we prefer 
"supported" to "indoctrinated". ;)

There is indeed a "disconnect", even a series of them. I haven't been 
particularly impressed even with the internationalization built in to a CMS 
like Drupal. We don't have a workable process to replace the app-based PO 
setup. Everyone flounders around with their own process, and so far nothing 
really works for the entry-level doc writer.
> 
> The "right" way to address the localization of long-form text is a challenge 
> that has been kicked around in a number of OLPC forums, the library list (for 
> HTML-based content), the wiki gang (for wiki L10n) as well as the Support 
> Gang list (for the localization of manuals and other documentation).  To 
> date, no clear consensus around tools or methods has appeared.  Solutions 
> remain a mix of one-off and intensively manual processes.   There have been 
> various attempts to work towards using PO-file based methods (like the an 
> early attempt with the OLPC web-site itself), but they have not gained real 
> momentum or sustainability.
> 
> The FLOSS manuals setup for localization has been explored, but I don't think 
> it could be called truly successful (while different languages have been set 
> up, the number of those completed and published is not great).  Whether that 
> is due to the tools or the challenge of coordinating the human resources 
> involved is not ewntirely clear to me.

I've registered there, joined Yet Another Mailing List™, and will give you some 
feedback once you provide the reviewed manuals on which you recommended I 
begin. From what I've seen of the FLOSS Manuals site [4] so far, I have 
concerns about access control (spam or inexperienced translating, and 
overwriting of existing translations) and the ability to use translation 
memory, backup your work, or get workable diffs with update. So far, I haven't 
received any answers to specific questions on those issues on the FLOSS Manuals 
list, although they have been quick to welcome and support participation. I 
think it is likely that the site is setup more for writing/publishing than 
localizing.

>   Obviously, there are real advantages to be gained by providing a single 
> interface (Pootle) to localizers, but all of the necessary pieces have not 
> yet been assembled to enable that for long-form text L10n work yet.
>  
> I wish there was a happy answer to your question.  I do think that some 
> individual elements of the solution are available (like po4a), but it will 
> take some considerable effort by a fair number of people to establish and 
> then maintain a long-form text publication process that works as smoothly as 
> the current code L10n set-up.  Given that neither Sugar Labs or OLPC has 
> genuinely taken ownership of the textual content creation and curation 
> process, I'm not sure that the committed resources to accomplish an overall 
> solution will be forthcoming and so this remains a challenge that is not 
> adequately addressed.
> 
> Those are just my thoughts, which are not intended as criticisms but as an 
> acknowledgement of the status quo.  There are some content-creation projects 
> related to health education on the XO laptop that I would love to take on, 
> but I've put them on the back-burner because I don't think they would be 
> sustainable (in a useful, localized form) with the level of resources I could 
> commit by myself.
> 

It comes down to resources every time (would short-time investment, e.g. a 
grant or Google SoC project, make a difference?). Thankyou very much for your 
thoughts. This is really a pan-project issue, and one affecting all of us.

I'd be interested to see suggestions from translators, coordinators and the 
Pootle/Translate-Toolkit devs on what we really need to make doc translations 
effective and sustainable. Can we simplify or build on the 
po4a/Translate-Toolkit/Pootle process? Can we integrate other existing XLIFF 
tools? What works for you? What would work better?

from Clytie 

Vietnamese Free Software Translation Team

[1] http://scratch.mit.edu/

[2] http://www.tuxpaint.org/

[3] http://l10n.gnome.org/teams/vi

[4] http://en.flossmanuals.net/FLOSSManuals/TranslatingAManual

Reply via email to