On Fri, 13 Feb 2009, Lucas Nussbaum wrote:
It depends on your data source. I'm not familiar with DDTP. If (package, version) is enough as a primary key, let's just use that.
I commited a ddtp importer to collab-qa/udd. It is based on DDTP translation files which are enriched by the package version (compared to those which are populated on all mirrors) and are awailable at http://ddtp.debian.net/Translation_udd (see svn://svn.debian.org/svn/collab-qa/udd/config_ddtp.yaml). I met last weekend with Grisu in person and the explanation for the issue is the following: Originally Grisu had the concept that the MD5 sum of an English description is sufficient as a key to assign a translation to a certain package. He considered the version number of the package as uninteresting because a description might be constant over several versions of a package. The tools which are working with the Translation files are adopted to this philosophy. There are several of them. Grisu told me that he provides only unzipped Text files which are grabbed by some ftpmaster tools which check the contents of the files first (so adding an extra field would not pass this test for the moment) and propagate compressed versions to the Debian mirrors. Tools like apt and others might relay onto this format. I'd regard it probably cheap to provide a patch which just ignores an additional field - perhaps everything might work out of the box - but currently we do not know this. When I started with the UDD gatherer for DDTP I learned that there are several translations for the same package in sid. The reason is that some architectures might not catch up that quickly as others and if the description of such a package has changed you end up with two or more translations for one package and have to make a reasonable assignment to the packages which are inside UDD. To tackle this I tried to calculate MD5 sums of the package descriptions which turned out quite error prone. The code became hard to read hacky and not really reliable (perhaps it is just me - but anyway). So it turned out to be the best idea to add the version information directly to the Translation files. There was some arguing with Grisu about redundance. It first I think redundance is not bad per se - there might be reasons where it makes sense - for instance if code becomes more robust and reliable (and in additioon avoids expensive calculations - compare calculating an MD5 sum *and* compare the result against just comparing a version string). Moreover it is not redundant inside the DDTP table - it just adds the extra information about the version which actually *is* in the package pool (as I explained above a MD5 sum might be true for several versions). The result of these considerations was that Grisu now runs the very same job to export of the DDTP database twice: one into the established format without version information and one into the version enriched format for a simple import into UDD. If you agree I will try to make this the "single official" format because I'm not really happy about having an extra service for UDD - sooner or later things might diverge and it is better to have a single default. This is the current situation and the things I describe below are based on these version enriched DDTP files. Commits to svn://svn.debian.org/svn/collab-qa/udd/ 1. config_ddtp.yaml Configuration file to set path, location of the ddtp files and the releases we consider. We import all packages which are supported by ddtp - so no need to explicitely specify the languages 2. sql/ddtp.sql Create the table in UDD. Some fields contain comments. I wonder whether we should relay on the inline comments in this file or whether we should implement "COMMENT ON TABLE ddtp IS ...". Just tell me what you prefer. 3. scripts/fetch_ddtp_translations.sh Fetch the Translation-<lang>.gz files from DDTP server via http using curl. I did not found a better method to obtain "all files in a web directory" (we want all supported languages safely even if some additions might occure) than using curl in connection with the contributed script http://cool.haxx.se/cvs.cgi/curl/perl/contrib/getlinks.pl.in I'm not perfectly happy to use a not yet packaged script and perhaps I should implement the fetching script using perl LWP::UserAgent - just tell me if you see the current method as drawback and I'll change this. 4. scripts/getlinks.pl The script from curl contrib mentioned above. 5. udd/ddtp_gatherer.py The actual gatherer which parses the Translation-<lang>.gz files fetched previosely and injects the information into the table ddtp of UDD. The table is deleted before every import completely and than imports the content of all fetched Translation files. Remark: I have some "more or less working hackish" code for gathering the information of Translation files without versions. Just tell me whether I should commit this for comparison. The gatherer works if you try: python udd.py config_ddtp.yaml update ddtp python udd.py config_ddtp.yaml run ddtp Please tell me what steps have to be done next to finally let this work as official UDD gatherer in the regular cron job. Kind regards Andreas. -- http://fam-tille.de -- To UNSUBSCRIBE, email to debian-qa-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org