Hi Gintautas
Gintautas Miliauskas wrote:
I do not really care about the actual implementation of the backend.
The reason I am advocating an RDBMS is to influence the design of the
backend. In the current situation, an API that just presents parsed
files and writes objects back would do, but it would not be possible to
have an efficient backend based on a database this way.
Independently of the backend storage, FOSS projects and translators (the
users and customers of the system) work with files, which must interact
with Gettext, translation editors, etc. If the API does not produce
files, then another layer on top will have to do it, creating a new
unnecessary layer. Same for the XML-RPC. If we want to exchange data
between servers, this data will relate to specific projects, and the
data will have to be grouped in the files that the project produces.
Only in the case of the on-line editor, strings might be addressed
directly, without having to construct the file, but contextual
information will stll be necessary in many cases. For example, when
translating a segmented text (such as a help page), it is important to
know the preceeding sentences and the ones that follow.
Here's a single simple functional requirement that I propose for the
backend:
=============
It should be possible to write a separate program which should be able
to do everything possible only using the backend object. No
assumptions on files, no nothing, just manipulating objects. I would
not have to care about locking, other servers running in parallel
using the database, the OS, the filesystem or anything else. That also
includes uploads of new languages / new projects.
=============
Specifically, say I want an XML-RPC server. I just write a Python
script that imports some classes from pootle.backend (or whatever),
defines some functions and makes them public.
It's a pretty basic requirement for any component: independence. Does
your new design offer that?
Independence of... ?
There is also a more technical question of the nature of the backend.
It can be procedural (you call functions which return things), or it
can be object-oriented: there are a few functions that return "basic"
objects defined by interfaces rather than by implementation. Then
you operate on those to get subobjects, to save the changes to the
objects, etc. Which approach are you using?
A few more notes:
Trying to stick to a standard format (XLIFF, .po or anything else)
for backend storage is not a good idea because it will be unnecessarily
limiting. The standards have their own specifics which we may not
care about, so there's the overhead of storing things the 'standard'
way, even if it is not convenient. Eventually the format will simply
not be enough. We will want to keep lots more metadata (e.g.,
string history, string submitter, date, ...). Storing that in external
files, separated from the actual data, will be increasingly
uncomfortable.
You should read the standards. They are made by people who have been
working in localisation for many years, have extremelly clear
understanding of what is necessary, how to structure it and how it
should be encoded. We DO care about them. If there was no PO standard,
each FOSS application would have to do its own translation editor, and
we would not be here now. We produce standard files so that standard
translation editors can be developed and used. There is no modern
computer science without standards.
Standards are in constant review and evolution, making sure that new
types of data that might be necessary are implemented. There is not such
a thing as "eventually the format will not be enough". Beside standard
extensions, XLIFF and many other XML formats (such as say OpenDocument)
allow user extensions, for the cases in which the people who define the
format might have left anything out. We are working with XLIFF 1.1, but
XLIFF 2.0 is being worked on, even if very few changes will take place,
and all of them backwards compaltible.
The debate of files/DB backend is -nevertheless- independent from the
use of standards.
Anyway, from the engineering point of view, there is 'primary' data,
and there are 'views' on that data. Do not confuse the two!
XLIFF, .po or .html are just views. Data is just that, data, it has no
connection to a format until you serialize it. (An RDB is attractive
as you don't have to serialize your data and commit to a format.).
You DO commit to a format, it just happens to have very efficient ways
of handling data (in general)
Frankly I am extremely puzzled with what I perceive to be a hostile
look towards RDBMSes. Some seem to be willing to jump through many
hoops to defend file-based approaches. In fact I found conflicting
advice in the wiki itself:
Please do not confuse being careful with being against something, and
please do not use words as "hostile". Some of us have been working on
localisation for quite a number of years, as well as in development,
databases and development of standards. We understand the complete set
of data that needs to be managed, something in which some of us have
been working for quite a while. The DB vs. files approach has been
discussed innumerable times, we are quite aware of the advantages of
databases.. and of their problems. Any argument that you might put
forward has already been used internally, by people who have been using
DB for quite a while (some of us for 20 years). The use of files has
quite a number of advantages, and the project has followed this line of
development, which we question often, but never strong enough as to
abandon current developments and change. The issue of scalability
requires that we look at the DB approach. Having said this, your
forceful approach on DB demands black/white answers (agree/do not agree)
on a subject that for us is much more complex.
If there is change, it will not be tomorow. It requires clear planning
and some security that the new approach is better, which we will only
have through experience. This is why I propose in my prior mail
developing an experimental second DB based back-end (which we are
prepared to fund), to ensure that all data can be easily mapped and that
it works better. If it comes out to be clearly better, we will be the
first ones to go for it.
I am
feeling a little frustrated. I sincerely want to help this project as
much as I can, and I feel that these fundamental issues must be resolved
before I get deep into the technical details.
The work that was planned is for your SoC project was very clear, and
does not require any decision on the technology of the back-end. It will
help the implementation of different approaches, but those approaches do
not need to be decided now (even if work on figuring out if they are
better can start immediatly).
You opinion on the back-end is important, as many others, but please
remember that there are other people involved, and that there are
reasons why we do things the way we do them. At some point we might need
to change the way things are made, but we need to be sure that we are
moving to a better approach. Opinions are not enough.
Javier
--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]