Re: Brainstorming: Can we refactor the website to make translation easier?

Rob Weir Mon, 26 Aug 2013 06:46:39 -0700

On Fri, Aug 23, 2013 at 5:12 PM, janI <[email protected]> wrote:
> On 23 August 2013 21:11, Rob Weir <[email protected]> wrote:
>
>> On Fri, Aug 23, 2013 at 12:21 PM, janI <[email protected]> wrote:
>> > On 23 August 2013 17:58, Rob Weir <[email protected]> wrote:
>> >
>> >> (Responses to [email protected], please)
>> >>
>> >> Obviously our website is quite large.  Google reports 21207 pages
>> >> indexed in the www subdomain, and a further 48075 pages in the wiki
>> >> subdomain.   But for purpose of this post, when I talk about the "home
>> >> page" I'm talking about the contents of our main index.html and the
>> >> most commonly visited pages directly linked to it, e.g., the
>> >> why/download/product/get-involved, etc. pages.
>> >>
>> >> This core homepage content amounts to around 25 pages.
>> >>
>> >> Today this content is scattered around the content tree.  Some of it
>> >> is in the root.  Some of it in /why and /download directories.  Some
>> >> of it is template-related and is in /templates rather than in
>> >> /content.
>> >>
>> >> As a test I tried to create my own NL page, in the fictitious "xx"
>> >> locale.  You can see it here:  http://www.openoffice.org/xx/
>> >>
>> >> It is not working correctly, but it already required a lot of
>> >> non-trivial hacking:
>> >>
>> >> 1) I had to hunt around and guess which files to copy.  Do I copy
>> >> scripts, images and CSS, or just content pages?   Some of the
>> >> directories had out-dated content that was not linked to my anyone.
>> >> It was hard to figure out what the minimum amount of content needed
>> >> was, and where it was located.
>> >>
>> >> 2) The main index.html file had to be edited to refer to CSS in the
>> >> root, rather than current directory
>> >>
>> >> 3)  Download page is missed up, missing CSS and/or scripts.
>> >> Presumably I need to copy something into the xx/download dir, or edit
>> >> scripts to make them refer /download off the root.
>> >>
>> >> 4) The /xx/why pages are not showing the right side navigation now.  I
>> >> must have missed something there as well.
>> >>
>> >> Of course, I could figure the above out eventually.  It just requires
>> >> some time and effort and trial and error.  But none of this is
>> >> documented, and even if it were this is a fragile approach and
>> >> probably beyond th web development skills of a typical translator.
>> >>
>> >> But we do know this has been done for some languages.  They got it to
>> >> work.  The German page is a good example:
>> >>
>> >> http://www.openoffice.org/de/
>> >>
>> >> Now this looks good, but it is still a messy thing from a maintenance
>> >> perspective.  If we make structural changes to the main English page,
>> >> then those changes need to be manually merged into to every NL page.
>> >>
>> >> What can we do to improve this?
>> >>
>> >> Here's my idea:
>> >>
>> >> 1) What if we refactored the home page so it was all self-contained
>> >> into these directories:   /scripts,  /styles,  /images and /en/?
>> >>
>> >> 2) Make the /en directory be pure content.  Only the stuff that needs
>> >> to be translated.  It loads everything else, scripts, images, etc.,
>> >> via URLs relative to the root, e.g.., in /scripts, /styles, etc.
>> >>
>> >> 3) Reduce or eliminate any embedded Javascript within pages.  For
>> >> example, refactor the code in download/index.html so it is external
>> >> and depends on JSON resource files for translated strings.  Aim so
>> >> translators never need to touch script.
>> >>
>> >> 4) Ultimate goal is for someone to be able to jump start a new NL home
>> >> page by simply requesting an svn copy of the /en directory, and then
>> >> editing the resulting files.  No one should ever need to do what I'm
>> >> doing with the "xx" pages.
>> >>
>> >> 5) Maintenance is far easier.  Most things like changing the scripts,
>> >> is done in one place only.  But even changes to the HTML are easier.
>> >> Since we then have a common branch point via the svn copy, when
>> >> structural changes are added to the main /en HTML, these can be merged
>> >> in more elegantly to the translated versions, using Subversion.
>> >>
>> >> 6) Via Apache redirects we can ensure that the default call to
>> >> www.openoffice.org/ goes to /en/.  Conceivably we could also do locale
>> >> detection and send requests automatically to the appropriate NL home
>> >> page.
>> >>
>> >> A variation on the above would be to use Pootle, rather than svn
>> >> copy/merge to maintain the translations.  But that would require the
>> >> same refactoring work to enable it.  And it would require further
>> >> investigation to identify a way of extracting and merging translation
>> >> strings in MDText files as well as (X)HTML files.
>> >>
>> >> This is obviously more than a one-person task.  So I'd be interested
>> >> in hearing what you think in general about this approach, whether
>> >> there is a simpler alternative I've missed, and whether this is
>> >> something you'd be interested in helping with.
>> >>
>> >
>> > I like a lot of your ideas, let me add my own experience.
>> >
>>
>> Thanks.
>>
>> > If the our pages do not contain text, but that is totally outsourced in
>> one
>> > or more json objects, then translation becomes easy, and the pages
>> themself
>> > stay simple. when the url is called without arguments the "en-json" is
>> > used, and if called with lang="xx" "xx-json" is used.
>> >
>>
>> I like the idea of content/code separation.  We certainly do this is
>> the code, for example.  But two challenges to taking this approach all
>> the way with the  website.
>>
>> 1) If we do JSON everywhere then we have a Javscript dependency
>> everywhere.  This has an impact for visibility of the pages to search
>> engines, but there are workarounds.  But it may be a bigger issue for
>> users who block Javascript.
>>
>> No we would do that solely on the server side, it would not be a good idea
> to have JS retrieve the json objects.
>
> We could eg. use php, that retrieved to correct json object, and
> transmitted a finished page.
>


OK.  We're on the same page.

>
>> 2) There may be cases where a translation requires direct access to
>> the HTML or CSS.  For example, I think the Tamil translation needed
>> access to specify a specific font.  And for some languages they might
>> need to set text direction to RTL.   These kinds of things make almost
>> any approach more complicated.
>>
>
> Look at e.g. our mwiki that handles those details all on server side.
>
> And just as a suggestion, if we were to use wordpress, things like fonts
> would be solved. WP also have a possibility (not json) for multi language,
> which I could easy adopt in genLang (for translation).
>
>
>>
>> So the question we need to answer is how far we take this?   I think
>> we have some examples where the code is so intertwined with the text
>> that translation becomes very hard and risky.  For example, the
>> generation of the "boxes" on the download page.   But then we have
>> some other pages, especially the MDText pages, where I would be
>> comfortable handing it directly to a translator and expect they could
>> edit it without breaking anything.
>>
> We can always find examples where it becomes hard, but typically you can
> reformulate the problem so it fits in a standard (boxes are no real
> problem). The only problem I see is with JS, where are ask and get answers
> e.g. YN.
>
>
>>
>> The Javascript dependency might be broken if we make this be a CMS
>> build-time text replacement rather than a runtime/Javascript
>> replacement.  So the CMS would detect when the Pootle files change and
>> automatically generate new HTML pages from them.  But even then we
>> still would need some runtime integration of strings, specifically on
>> the download page where language and OS are determined at runtime
>> based on browser request headers.
>>
> I would consider not to use cms, because we basically dont need it.
>

So within the realm of server-side software, what is possible?  My
impress was that Infra generally cautions against runtime server side
execution due to the greater opportunity for security problems.
That's why we're using build-time page generation.  This approach also
performs well, since it is static HTML pages at runtime, and is very
stable.

But in any case, I think the refactoring work is approximately the
same thing regardless of how the pages are generated.

-Rob


> rgds
> jan I.
>
>>
>> -Rob
>>
>> > If we use json objects, then pootle becomes an elegant tool for
>> > translation, because it knows how to handle xml, and if we want  to stay
>> > with po files its about 1 day work in genLang.
>> >
>> > A number of top companies (incl. the one I used to work for) do it like
>> > this, they of course then hire a translator to translate the json
>> objects.
>> >
>> > Splitting functionality and text is the key, when thats done the rest is
>> > trvial work.
>> >
>> > This will of course make cms a bit top kill, but I can live with that :-)
>> >
>>
>>
>>
>>
>> > rgds
>> > jan I.
>> >
>> >
>> >>
>> >> Regards,
>> >>
>> >> -Rob
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: [email protected]
>> >> For additional commands, e-mail: [email protected]
>> >>
>> >>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Brainstorming: Can we refactor the website to make translation easier?

Reply via email to