On Fri, Apr 20, 2012 at 10:26 AM, Kent Fredric <kentfred...@gmail.com> wrote: > On 20 April 2012 19:46, Corentin Chary <corentin.ch...@gmail.com> wrote: >> On Fri, Apr 20, 2012 at 9:37 AM, Kent Fredric <kentfred...@gmail.com> wrote: >>> On 20 April 2012 03:31, Corentin Chary <corentin.ch...@gmail.com> wrote: >>>> Add rubygems, github, gitorious, pecl, pear, bitbucket. >>>> All of them are handled by my remoteids.py script. >>>> >>>> ref: https://bugs.gentoo.org/show_bug.cgi?id=406287 >>>> ref: https://github.com/iksaif/portage-janitor/blob/master/remoteids.py >>>> >>>> --- a/metadata/dtd/metadata.dtd 2010-03-02 18:52:11.000000000 +0100 >>>> +++ b/metadata/dtd/metadata.dtd 2012-04-19 14:22:14.077954310 +0200 >>>> @@ -61,7 +61,7 @@ >>>> <!ELEMENT bugs-to (#PCDATA)> >>>> <!-- specify a type of package identification tracker --> >>>> <!ELEMENT remote-id (#PCDATA)> >>>> - <!ATTLIST remote-id type >>>> (freshmeat|sourceforge|sourceforge-jp|cpan|vim|google-code|ctan|pypi|rubyforge|cran) >>>> #REQUIRED> >>>> + <!ATTLIST remote-id type >>>> (freshmeat|sourceforge|sourceforge-jp|cpan|vim|google-code|ctan|pypi|rubyforge|cran|rubygems|github|gitorious|pecl|pear|bitbucket) >>>> #REQUIRED> >>>> >>>> <!-- category/package information for cross-linking in descriptions >>>> and useflag descriptions --> >>>> >>>> -- >>>> Corentin Chary >>>> http://xf.iksaif.net/ >>> >>> >>> I suggested last week on #gentoo-perl that it might be nice to have >>> 'cpan' and 'cpan-module' ( or something like that ) to disambiguate 2 >>> queryable terms. ( where 'cpan' => 'the package name on cpan' ) >>> >>> For some purposes, its most convenient to use the distribution name, >>> and for other purposes, (ie: cpan clients) its more convenient to use >>> a Module name, and its not easy to translate between the two, as >>> Module names sometimes switch between packages they're shipped in. >>> >>> For instance, a while ago, the BioPerl module was shipped in a >>> distribution 'bioperl' , which has only recently been changed to >>> BioPerl >>> >>> >>> http://api.metacpan.org/release/_search?q=distribution:bioperl&fields=archive,author,date,download_url >>> >>> http://api.metacpan.org/release/_search?q=distribution:BioPerl&fields=archive,author,date,download_url >>> >>> vs >>> >>> >>> http://api.metacpan.org/module/_search?q=module.name:Bio\:\:Perl&fields=distribution,author,release >> >> Looks sane since the goal of remote-id is being able to identify the >> package upstream. >> Do you think you could patch remotesid.py to generate tags for cpan / >> cpan-modules ? Or at least give me a pseudo-algo that does the trick. >> Thanks :) >> >> -- >> Corentin Chary >> http://xf.iksaif.net >> > > > That is sadly not straight forward. Extracting the package name can > be straight forward if you have the URL, because the package name is > literally the same as the archive name in SRC_URI , sans version > information. > > However, if you look at many perl ebuilds, you'll notice many lack > this field and we've got other things in place, so the current parsing > technique you use to detect uses of SRC_URI wont work there ( I could > be wrong, I don't fully grok your python code )
Currently it uses SRC_URI and HOMEPAGE, but honestly it wouldn't be hard to use any other environment variable and to do some checks on a webservice. Anyway for tricky cases it can still be done by hand. > And more-over, determining the value of 'cpan-module' may be > impossible without access to the tar.gz itself, or querying the > MetaCPAN API. > > Usually, upstream are sensible and have package names which closely > correspond with the module names, ie: "Dist::Zilla" is shipped in > 'Dist-Zilla-$VERSION.tar.gz', but there are many packages which dont > do this, such as this notable example: > https://metacpan.org/release/Scalar-List-Utils , which has no modules > corresponding to the package name, and no way to divine the/a 'main' > module from the package itself. ( and this is exacerbated by packages > changing names, or package joins ( 2 packages becoming 1 via releasing > modules together ), and package splits ( 1 package rips into 2 sets > of modules ). > > Essentially, using a cpan-module as an identifier is somewhat > "forwards only" , and even then, what it will resolve to is governed > by time. > > This is fine for CPAN clients, which do the resolution hot, using the > whole of CPAN as their data, if a user asks for "Foo::Bar", their cpan > client will ask a cpan server ( or regularly (hourly) updated list ) > as to what package that module can be found in ( and this only returns > the most recent package, so name changes and so-forth are invisible to > the user ). > > And being helpful to CPAN clients is one of the reasons we want this > value as a specifiable option in the first place. For us, its easier > to track the package name, and then when that has to change we can > manually resolve the issue > > -- > Kent > > perl -e "print substr( \"edrgmaM SPA NOcomil.ic\\@tfrken\", \$_ * 3, > 3 ) for ( 9,8,0,7,1,6,5,4,3,2 );" > > http://kent-fredric.fox.geek.nz > -- Corentin Chary http://xf.iksaif.net