On Sat, Jun 17, 2000 at 10:27:18AM -0500, Elaine -HFB- Ashton wrote:
> Graham Barr [[EMAIL PROTECTED]] quoth:
> *>>
> *>> I still love it as one big piece. I wouldn't mind producing additional
> *>> lists chapterwise though. Would that fit the bill?
> *>
> *>The catalog on the fron page at search.cpan.org is supposed to be the module
> *>list split down. BUt it needs work.
>
> I can have a go at it if you like. It is something of a hodgepodge at the
> moment and it would be something I could manage. Some of them, like the
> FrameMaker::, etc. should be noted in the database as placeholders and the
> like.
Feel free to make any changes to the Catalog database. The Catalog module does
provide a web interface to do this, but I think it will take a lot of
work. I am talking with the Catalog people about an API so we can automate things.
> *>> If search isn't programmed to be fast, we are in deep troubles. Maybe
> *>> the code should be made publically available and setting up mirrors of
> *>> search should be made easy. That could serve two purposes: attract
> *>> contributing programmers and later clusterize search services. Maybe
> *>> such a tarball is available already?
> *>
> *>No it's not avaliable yet. But the search right now is an SQL search. That
> *>needs to change.
>
> Speed isn't really the issue here as if someone hammered the site with
> requests for core modules you could have a supercomputing cluster and
> still drag it down. The idea would be to minimize the need for the
> decompressing the Perl source by having it look for it in an uncompressed
> directory and fetch it from there or, on last resort, go uncompress the
The site does keep a cache of uncompressed files.
> source. As far as actual speed is concerned, I've been impressed it can
> handle 20k requests a day without really loading the server.
20k requests a day is not very many really.
> I wouldn't want to wish that upon mirrors without a warning or a solution.
> The bummer about http is that it takes some work to deny access without a
> firewall. My crawling little friend from .ar was using 'Offline Explorer'
> which respected neither the robots.txt nor paid any attention to the 403
> errors he was getting. He was denied, but he was still loading up the
> server because it was trying to service the requests. I don't think the
> person was actually working at doing this either and possibly just wanted
> to collect data for a local copy for faster access. As it gets more
> popular, so too will this become more common.
Yes it probably will. If it really becomes a problem we can put some kind
of dynamic restriction on it.
> *>> Sure, looks much better than before, thanks! I've replaced the thing
> *>> on PAUSE's incoming directory with this fix.
>
> :)
>
> *>YEs, clicking on a dist will always take you to the latest dist by that
> *>name rather than just by the author. It is something that needs fixing.
> *>
> *>> What search doesn't know is that both TOMC and ANDK are on an access
> *>> control list, so uploads from either of them will get indexed while
> *>> uploads by anybody else will be ignored. We need either to propagate
> *>> the ACL to search or search needs to follow
> *>> modules/02packages.details.txt.gz more closely. I'm not sure which of
> *>> the two.
> *>
> *>Neither am I
>
> That depends on what kind of ACL it is. If it is simply an issue of
> current version then the details file might be more useful. Something else
> could be done to index the modules by version.
>
> http://search.cpan.org/search?mode=author&query=TOMC gives you the right
> output because he does still have the 2.00 version in his directory, but
> clicking on it takes you to ANDK. Since ANDK holds the current version,
> this is appropriate behaviour which is why I was suggesting that a visible
> method of marking deprecation would be useful in cases like this. Or,
> moving all deprecated modules to backpan might do as well.
I think there are many ways of approaching this problem. I have just never
been able to decide which way to jump :)
Graham.