On Mon, Feb 27, 2012 at 12:20:31AM +0000, Stuart Prescott wrote: > I think your changes are necessary so that the derivatives_descriptions > table, which we are currently not populating, can eventually be properly > populated, and there is some benefit in having the same schema for each of > the *_descriptions tables even if the debian_descriptions table will always > have "debian" in that column.
No, even the (debian_)descriptions table has (on blends.debian.net): udd=# SELECT distribution, component, release, count(*) from descriptions where language = 'en' group by distribution, component, release order by distribution, release, component; distribution | component | release | count ------------------+-----------------------+--------------------------+------- debian | contrib | experimental | 12 debian | main | experimental | 2271 debian | non-free | experimental | 39 debian | contrib | sid | 237 debian | main | sid | 37198 debian | non-free | sid | 518 debian | contrib | squeeze | 189 debian | main | squeeze | 28662 debian | main/debian-installer | squeeze | 1043 debian | non-free | squeeze | 427 debian | main | squeeze-proposed-updates | 248 debian | contrib | squeeze-security | 1 debian | main | squeeze-security | 1032 debian | main | squeeze-updates | 163 debian | contrib | wheezy | 216 debian | main | wheezy | 35487 debian | non-free | wheezy | 450 debian | main | wheezy-proposed-updates | 115 debian-backports | contrib | squeeze | 8 debian-backports | main | squeeze | 1247 debian-backports | main/debian-installer | squeeze | 4 debian-backports | non-free | squeeze | 32 (22 Zeilen) so even if we ignore derivatives_descriptions we need this column (but for sure it becomed much more obvious if we take derivatives into consideration). > That said, I don't believe this change will actually help the problem you > are seeking to address: "distribution" should be uniformly "debian" in the > descriptions generated by the packages gatherer for all Packages files > coming from Debian. Descriptions imported by the ddtp gatherer itself will > also always have "debian" at this stage. Well, IMHO ddtp gatherer should reflect what packages table has. On official udd.debian.org it is: udd=> SELECT distribution, count(*) from packages group by distribution; distribution | count -------------------------+-------- debian | 942393 debian-backports | 26766 lenny-volatile-proposed | 205 debian-backports-sloppy | 408 lenny-volatile | 127 (5 rows) So I'm not fully sure whether we can go with 'debian' only for distribution in the descriptions table. The only way to prevent "other" distrbutions like debian-backports from injecting their descriptions would be to drop descriptions-table: descriptions entry from the "debian-backports-squeeze:" section in the config file. I just noted this problem once I was running the gatherer on blends.debian.net and detected that there are way to less en descriptions for squeeze. The reason was that debian-backports-squeeze was imported after debian-squeeze and the descriptions matching release='squeeze' and language='en' were replaced by this later import because distribution was not regarded. > The data from squeeze vs squeeze-backports should be differentiated in the > "release" column, not in the distribution column. This is what I thought in the first place but I noticed that it is handled that way in the packages / sources table. I do not think we should break the logic of these tables (even if I have some positive feeling for your arguing in fact). > Looking at config- > org.yaml, I suspect that the real problem is that the "release" key for > squeeze-backports is incorrectly set: > > debian-backports-squeeze: > [...] > release: squeeze > > if set to "squeeze-backports" then the release column will instead > distinguish the translations from one-another in the (package, release, > component) tuple. This is probably a simple copy+paste error from the > squeeze release; fixing this should also fix the translation clobbering > problem. I put Lucas in CC whether this is really the case. In any case this should be discussed here. Kind regards Andreas. PS: I just noticed that there is some other issue with the ddtp importer left. It is very frequently claiming duplicated data sets. Need to track this down in the next couple of evenings. -- http://fam-tille.de -- To UNSUBSCRIBE, email to debian-qa-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20120228110658.gf15...@an3as.eu