On 20 April 2012 19:46, Corentin Chary <corentin.ch...@gmail.com> wrote: > On Fri, Apr 20, 2012 at 9:37 AM, Kent Fredric <kentfred...@gmail.com> wrote: >> On 20 April 2012 03:31, Corentin Chary <corentin.ch...@gmail.com> wrote: >>> Add rubygems, github, gitorious, pecl, pear, bitbucket. >>> All of them are handled by my remoteids.py script. >>> >>> ref: https://bugs.gentoo.org/show_bug.cgi?id=406287 >>> ref: https://github.com/iksaif/portage-janitor/blob/master/remoteids.py >>> >>> --- a/metadata/dtd/metadata.dtd 2010-03-02 18:52:11.000000000 +0100 >>> +++ b/metadata/dtd/metadata.dtd 2012-04-19 14:22:14.077954310 +0200 >>> @@ -61,7 +61,7 @@ >>> <!ELEMENT bugs-to (#PCDATA)> >>> <!-- specify a type of package identification tracker --> >>> <!ELEMENT remote-id (#PCDATA)> >>> - <!ATTLIST remote-id type >>> (freshmeat|sourceforge|sourceforge-jp|cpan|vim|google-code|ctan|pypi|rubyforge|cran) >>> #REQUIRED> >>> + <!ATTLIST remote-id type >>> (freshmeat|sourceforge|sourceforge-jp|cpan|vim|google-code|ctan|pypi|rubyforge|cran|rubygems|github|gitorious|pecl|pear|bitbucket) >>> #REQUIRED> >>> >>> <!-- category/package information for cross-linking in descriptions >>> and useflag descriptions --> >>> >>> -- >>> Corentin Chary >>> http://xf.iksaif.net/ >> >> >> I suggested last week on #gentoo-perl that it might be nice to have >> 'cpan' and 'cpan-module' ( or something like that ) to disambiguate 2 >> queryable terms. ( where 'cpan' => 'the package name on cpan' ) >> >> For some purposes, its most convenient to use the distribution name, >> and for other purposes, (ie: cpan clients) its more convenient to use >> a Module name, and its not easy to translate between the two, as >> Module names sometimes switch between packages they're shipped in. >> >> For instance, a while ago, the BioPerl module was shipped in a >> distribution 'bioperl' , which has only recently been changed to >> BioPerl >> >> >> http://api.metacpan.org/release/_search?q=distribution:bioperl&fields=archive,author,date,download_url >> >> http://api.metacpan.org/release/_search?q=distribution:BioPerl&fields=archive,author,date,download_url >> >> vs >> >> >> http://api.metacpan.org/module/_search?q=module.name:Bio\:\:Perl&fields=distribution,author,release > > Looks sane since the goal of remote-id is being able to identify the > package upstream. > Do you think you could patch remotesid.py to generate tags for cpan / > cpan-modules ? Or at least give me a pseudo-algo that does the trick. > Thanks :) > > -- > Corentin Chary > http://xf.iksaif.net >
That is sadly not straight forward. Extracting the package name can be straight forward if you have the URL, because the package name is literally the same as the archive name in SRC_URI , sans version information. However, if you look at many perl ebuilds, you'll notice many lack this field and we've got other things in place, so the current parsing technique you use to detect uses of SRC_URI wont work there ( I could be wrong, I don't fully grok your python code ) And more-over, determining the value of 'cpan-module' may be impossible without access to the tar.gz itself, or querying the MetaCPAN API. Usually, upstream are sensible and have package names which closely correspond with the module names, ie: "Dist::Zilla" is shipped in 'Dist-Zilla-$VERSION.tar.gz', but there are many packages which dont do this, such as this notable example: https://metacpan.org/release/Scalar-List-Utils , which has no modules corresponding to the package name, and no way to divine the/a 'main' module from the package itself. ( and this is exacerbated by packages changing names, or package joins ( 2 packages becoming 1 via releasing modules together ), and package splits ( 1 package rips into 2 sets of modules ). Essentially, using a cpan-module as an identifier is somewhat "forwards only" , and even then, what it will resolve to is governed by time. This is fine for CPAN clients, which do the resolution hot, using the whole of CPAN as their data, if a user asks for "Foo::Bar", their cpan client will ask a cpan server ( or regularly (hourly) updated list ) as to what package that module can be found in ( and this only returns the most recent package, so name changes and so-forth are invisible to the user ). And being helpful to CPAN clients is one of the reasons we want this value as a specifiable option in the first place. For us, its easier to track the package name, and then when that has to change we can manually resolve the issue -- Kent perl -e "print substr( \"edrgmaM SPA NOcomil.ic\\@tfrken\", \$_ * 3, 3 ) for ( 9,8,0,7,1,6,5,4,3,2 );" http://kent-fredric.fox.geek.nz