--- Begin Message ---
Package: qa.debian.org
Severity: wishlist
User: qa.debian....@packages.debian.org
Usertags: udd
Hi.
I already discussed this feature with Lucas Nussbaum at Debconf11. It
would be nice to have a gatherer for patch-tracker.d.o data in UDD. I
already wrote a patch for patch-tracker.d.o to export the data we could
be interested into in a JSON file. The patch is still waiting for review
and I'll ping Sean in the next days.
I also wrote patches for UDD, and I'm attaching them here. They're still
incomplete, need to be adjusted for the actual setup of UDD (which I do
not know enough), still have some FIXME for potential SQL injection.
Moreover, SQL schemas have to be fixed and the whole infrastructure for
getting JSON data from patch-tracker.d.o must be set up.
But, in the meantime, people can have a look at the code to understand
what's going on.
I'm also attaching a draft of the SQL schema (which probably still lacks
things, most notably sane indexes). A gzipped test data file (based on a
snapshot of a few days ago) can be found here:
http://people.debian.org/~gio/patch_tracker.json.gz
Thanks, Gio.
--
Giovanni Mascellani <mascell...@poisson.phc.unipi.it>
Pisa, Italy
Web: http://poisson.phc.unipi.it/~mascellani
Jabber: g.mascell...@jabber.org / giova...@elabor.homelinux.org
From ef97570af8a56e7018b971574152cdcbd29c1b8b Mon Sep 17 00:00:00 2001
From: Giovanni Mascellani <mascell...@poisson.phc.unipi.it>
Date: Mon, 1 Aug 2011 11:51:39 +0200
Subject: [PATCH] Added sketch of patch_tracker gatherer.
---
udd/config-org.yaml | 7 +++++
udd/doc/sources/patch-tracker | 24 +++++++++++++++++
udd/udd/patch_tracker_gatherer.py | 52 +++++++++++++++++++++++++++++++++++++
3 files changed, 83 insertions(+), 0 deletions(-)
create mode 100644 udd/doc/sources/patch-tracker
create mode 100644 udd/udd/patch_tracker_gatherer.py
diff --git a/udd/config-org.yaml b/udd/config-org.yaml
index 0a0745b..368425c 100644
--- a/udd/config-org.yaml
+++ b/udd/config-org.yaml
@@ -13,6 +13,7 @@ general:
orphaned-packages: module udd.orphaned_packages_gatherer
bugs: exec DEBBUGS_CONFIG_FILE=/org/bugs.debian.org/etc/config perl /org/udd.debian.org/udd/udd/bugs_gatherer.pl
carnivore: module udd.carnivore_gatherer
+ patch-tracker: module udd.patch_tracker_gatherer
lintian: module udd.lintian_gatherer
debtags: module udd.debtags_gatherer
ubuntu-bugs: module udd.ubuntu_bugs_gatherer
@@ -546,6 +547,12 @@ carnivore:
login-table: carnivore_login
schema: carnivore
+patch-tracker:
+ type: patch-tracker
+ path: /home/giovanni/packages/qa/collab-qa/udd/mirrors/patch_tracker.json
+ patch-tracker-table: patch_tracker
+ patches-table: patches
+
lintian:
type: lintian
update-command: rm -f /org/udd.debian.org/mirrors/lintian.log && wget -q http://lintian.debian.org/lintian.log -O /org/udd.debian.org/mirrors/lintian.log
diff --git a/udd/doc/sources/patch-tracker b/udd/doc/sources/patch-tracker
new file mode 100644
index 0000000..20027f9
--- /dev/null
+++ b/udd/doc/sources/patch-tracker
@@ -0,0 +1,24 @@
+DESCRIPTION
+ patch-tracker.d.o is a web service that shows the patches that get
+ applied to the Debian packages at build time. A summary of the shown
+ data is stored and published as JSON file, so it can be included
+ in UDD.
+
+ We import such information assigning a unique integer ID to each
+ pair (package, release). Two tables are then used to store the
+ data:
+ * patch-tracker-table: keeps general information about the
+ package (name, version, release) and about the patching method
+ it uses (the series type - i.e., whether it uses quilt, dpatch
+ or other systems, including the 'native' and 'no_series' values;
+ and the number of lines added and removed outside the debian/
+ directory by the .diff.gz or .debian.tar.gz patch);
+ * patches-table: keeps information about the specific patches
+ (the package to which belongs, its name and the number of lines
+ added and removed); it will probably extended in the future to
+ store also DEP 3 metadata.
+
+CONFIGURATION:
+ path: The path of the JSON statistics from patch-tracker.d.o.
+ patch-tracker-table, patches-table: The names of the database
+ tables used to describe the packages and patches.
diff --git a/udd/udd/patch_tracker_gatherer.py b/udd/udd/patch_tracker_gatherer.py
new file mode 100644
index 0000000..ed56727
--- /dev/null
+++ b/udd/udd/patch_tracker_gatherer.py
@@ -0,0 +1,52 @@
+#!/usr/bin/env python
+
+import sys
+from gatherer import gatherer
+import json
+
+def get_gatherer(connection, config, source):
+ return patch_tracker_gatherer(connection, config, source)
+
+class patch_tracker_gatherer(gatherer):
+
+ def __init__(self, connection, config, source):
+ gatherer.__init__(self, connection, config, source)
+ self.assert_my_config('path', 'patch-tracker-table', 'patches-table')
+
+ def run(self):
+ my_config = self.my_config
+
+ cur = self.cursor()
+
+ cur.execute("DELETE FROM %s" % my_config["patches-table"])
+ cur.execute("DELETE FROM %s" % my_config["patch-tracker-table"])
+
+ # FIXME SQL Injection exposures here
+ cur.execute("""PREPARE patch_tracker_insert
+ AS INSERT INTO %s (id, package, version, release, series_type, nondebian_added, nondebian_removed)
+ VALUES ($1, $2, $3, $4, $5, $6, $7)""" % (my_config['patch-tracker-table']))
+ cur.execute("""PREPARE patches_insert
+ AS INSERT INTO %s (id, package_id, lines_added, lines_removed, name)
+ VALUES ($1, $2, $3, $4, $5)""" % (my_config['patches-table']))
+
+ # TODO We don't need to keep all the decoded JSON file in memory; there are
+ # libraries to access it in a stream (DOM-like) fashion, but Python
+ # bindings don't appear to be available in Debian (libyajl)
+ patch_tracker_data = json.load(open(my_config['path']))
+ package_number = 0
+ patch_number = 0
+ for line in patch_tracker_data:
+ package, suite, version, series_type, nondebian, patches = \
+ (line['package'], line['suite'], line['version'],
+ line['series_type'], line['nondebian'], line['patches'])
+ cur.execute("EXECUTE patch_tracker_insert (%d, '%s', '%s', '%s', '%s', %d, %d);" %
+ (package_number, package, version, suite, series_type, nondebian[0], nondebian[1]))
+ for patch_name, patch_data in patches:
+ cur.execute("EXECUTE patches_insert (%d, %d, %d, %d, '%s');" %
+ (patch_number, package_number, patch_data[0], patch_data[1], patch_name))
+ patch_number += 1
+ package_number += 1
+
+ cur.execute("ANALYZE %s" % my_config["patch-tracker-table"])
+ cur.execute("ANALYZE %s" % my_config["patches-table"])
+
--
1.7.5.4
--
-- Name: patch_tracker; Type: TABLE; Schema: public; Owner: giovanni; Tablespace:
--
CREATE TABLE patch_tracker (
id integer NOT NULL,
package text NOT NULL,
version debversion NOT NULL,
release text NOT NULL,
series_type text NOT NULL,
nondebian_added integer NOT NULL,
nondebian_removed integer NOT NULL
);
--
-- Name: patch_tracker_pkey; Type: CONSTRAINT; Schema: public; Owner: giovanni; Tablespace:
--
ALTER TABLE ONLY patch_tracker
ADD CONSTRAINT patch_tracker_pkey PRIMARY KEY (id);
--
-- Name: package_version_idx; Type: INDEX; Schema: public; Owner: giovanni; Tablespace:
--
CREATE INDEX package_version_idx ON patch_tracker USING btree (package, version);
--
-- Name: patches; Type: TABLE; Schema: public; Owner: giovanni; Tablespace:
--
CREATE TABLE patches (
id integer NOT NULL,
package_id integer NOT NULL,
lines_added integer NOT NULL,
lines_removed integer NOT NULL,
name text NOT NULL
);
--
-- Name: patches_pkey; Type: CONSTRAINT; Schema: public; Owner: giovanni; Tablespace:
--
ALTER TABLE ONLY patches
ADD CONSTRAINT patches_pkey PRIMARY KEY (id);
--
-- Name: package_id_idx; Type: INDEX; Schema: public; Owner: giovanni; Tablespace:
--
CREATE INDEX package_id_idx ON patches USING btree (package_id);
-- Name: patches_package_id_fkey; Type: FK CONSTRAINT; Schema: public; Owner: giovanni
--
ALTER TABLE ONLY patches
ADD CONSTRAINT patches_package_id_fkey FOREIGN KEY (package_id) REFERENCES patch_tracker(id);
signature.asc
Description: OpenPGP digital signature
--- End Message ---