Re: Gathering package upstream meta-data in the UDD. (was: Re: more formally indicating the registration URL)

2009-11-03 Thread Olivier Berger
Hi.

(Responding a little late after vacation time.)

Le lundi 26 octobre 2009 à 23:05 +0900, Charles Plessy a écrit :
> > On Thu, Oct 22, 2009 at 12:30:06AM +0900, Charles Plessy wrote:
> > > First of all, let's summarise the situation. We want to integrate some 
> > > metadata
> > > in our 'web sentinels', like 
> > > 'http://debian-med.alioth.debian.org/tasks/bio'.
> 
> Dear Andreas and Olivier,
> 
> thank you for your encouraging comments. 

SNIP

> In parallel, as Olivier suggested, each table could be exprorted in RDF 
> format.
> But I am not sure I undersand it.

What exactly don't you understand ? ;) If you look back at the pointers
I provided in http://lists.debian.org/debian-qa/2009/10/msg00050.html
you'll find an example of using the PRISM and CONNOTEA ontologies for
links with DOI and PUBMED IDs (more details in
http://www.prismstandard.org/resources/mod_prism.html maybe).

>  Olivier, could you suggest a Perl module to
> use?
> 

I suppose that searching for perl+rdf on your preferred search engine
will retrieve useful code ;)

I'm not a perl hacker myself, but as RDF is a standard of the W3C, there
are probably plenty of perl code to produce RDF.

http://search.cpan.org/~mthurn/RDF-Simple-0.415/lib/RDF/Simple/Serialiser.pm 
seems to be a valid candidate for first experiments.

Hope this helps.

Best regards,
-- 
Olivier BERGER 
http://www-public.it-sudparis.eu/~berger_o/ - OpenPGP-Id: 1024D/6B829EEC
Ingénieur Recherche - Dept INF
Institut TELECOM, SudParis (http://www.it-sudparis.eu/), Evry (France)


--
To UNSUBSCRIBE, email to debian-qa-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#549227: marked as done (UDD: please collect and expose the load time for update scripts)

2009-11-03 Thread Debian Bug Tracking System
Your message dated Tue, 3 Nov 2009 15:36:44 +0100
with message-id <20091103143644.gc3...@xanadu.blop.info>
and subject line Re: Bug#549227: UDD: please collect and expose the load time 
for update scripts
has caused the Debian Bug report #549227,
regarding UDD: please collect and expose the load time for update scripts
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact ow...@bugs.debian.org
immediately.)


-- 
549227: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=549227
Debian Bug Tracking System
Contact ow...@bugs.debian.org with problems
--- Begin Message ---
Package: qa.debian.org
Severity: wishlist
User: qa.debian@packages.debian.org
Usertags: udd

Hi!
It's common in a datawarehouse system (like UDD can be considered) to keep track
of the update jobs times: start, end, duration, records elaborated and so on.

This will allow to query such information to generate a report ob jobs
executions like: durations (mean, stddev, etc), growth, performance, eventual
tuning due to interaction with other scripts, and so no.

Such information, are usually stored in a different (internal) schema than the
main one, but I think we can just add a table in 'udd' (maybe prefixed with
'udd_' to claryfy it's a UDD interal information table) for such information.

Thanks,
Sandro

-- System Information:
Debian Release: squeeze/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.30-1-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash


--- End Message ---
--- Begin Message ---
On 02/11/09 at 22:59 +0100, Serafeim Zanikolas wrote:
> Hi guys,
> 
> On Wed, Oct 07, 2009 at 09:47:11PM +0200, Lucas Nussbaum wrote [edited]:
> > On 01/10/09 at 20:08 +0200, Sandro Tosi wrote:
> > > It's common in a datawarehouse system (like UDD can be considered) to 
> > > keep track
> > > of the update jobs times: start, end, duration, records elaborated and so 
> > > on.
> [..]
> > A patch adding the table you describe would be appreciated (the code
> > would have to be python)
> 
> Patch attached. I didn't add a duration column as it's trivially calculated on
> the fly. I'm open to suggestions about getting record counts before and after
> updates in a generic way.

Thanks a lot, I've applied it (the table is named timestamps, not
udd_timestamps) and adapted the check_timestamp script that tell me when
data sources have not been updated for a long time.

> ps. hacking UDD would be more fun without mixed indentation ;)

I thought I had fixed all the files, but I missed udd.py. Fixed now.
-- 
| Lucas Nussbaum
| lu...@lucas-nussbaum.net   http://www.lucas-nussbaum.net/ |
| jabber: lu...@nussbaum.fr GPG: 1024D/023B3F4F |

--- End Message ---