Re: Finding the module you want

David Manura Sat, 14 Feb 2004 11:39:58 -0800

Mark Stosberg wrote:
> On Tue, Feb 10, 2004 at 09:59:32AM +0100, A. Pagaltzis wrote:
>
>>* Mark Stosberg <[EMAIL PROTECTED]> [2004-02-09 15:26]:
>>
>>>I think the CPAN rating system could be of further help here as
>>>well.  It could be integrated with the search.cpan.org search
>>>engine. The rating could appear on the results page, with
>>>top-rated modules appearing first. So, just searching for a
>>>module named "mail" should be begin to give you a sensible
>>>result.
...
>  From another angle, I see the current problem with the rating system is
> not abuse-- I've never noticed any beyond people rating their own
> modules with 5 stars with reviews like "It's my module". It's primary
> downfall now is that it's simply not being used a lot. Making further
> barriers to using it would only serve to work this worse.

Smylers wrote:
> ... The modal score for individual ratings is clearly 5 stars ...
> It looks like the next most-common rating is 1 star, for the people
> panning the completely worthless modules.

Star ratings are for people who like pie charts. The star ratings in themselves (i.e. not including the text in the reviews) can be unreliable not just from intentional abuse but from vastly different perspectives in what the stars mean.

Cases in point:

  http://cpanratings.perl.org/d/SuperPython
  http://cpanratings.perl.org/d/CGI-NeedSSL

The relative star ratings on two modules rated by different small sets of people is probably useless in any statistical sense. I'd rather see comparable lists of exactly what the essential factors are in modules, something like Paul Hoffman did in his POD of the proposed Search::TokenIndex:

Paul Hoffman wrote:
> Advantages:
>
>     Simple API, easy to understand
>     Easy to get started if you just want to index text files
>     Versatile -- can index words, soundex codes, things other
>       than text
>     Reasonably fast -- indexing is O(n) where n is the number of
>       "tokens" in the set of indexed documents, and searching is
>       O(m) where m is the number of tokens in the search string
>     Can use tied hash for storage
>     No prerequisites
>     Easy to (un)serialize indexes
>
> Disadvantages:
>
>     Doesn't rank search results
>     Doesn't remember where tokens occur in a document
>     Doesn't know anything about proximity searching
>     Probably doesn't scale well to large document sets
>     Works strictly at the byte level -- doesn't know anything
>       about languages or encodings
>     No Boolean searching (everything is AND)
>     Doesn't rank search results
>     Did I mention it doesn't rank search results?

This of course would be improved if it included feedback from people other than just the author. Further, some points (e.g. "It uses an OO-ish interface") are neither advantages nor disadvantages--just a *factors* that the potential module user can oneself judge the merits of. In fact, if we remove the headings "Advantages" and "Disadvantages" above, the list of factors becomes much more objective than subjective.

Mark Stosberg wrote:
> I think it would be quite sufficient to have some way to flag reviews
> (or users) that appear to be abusive.

s/abusive/helpful/. Amazon.com allows reviews to be rated via a "Was this review helpful to you?" feature. I would want the most helpful reviews to be listed near the top of the list of reviews on cpanratings.perl.org so that I preferably only need to scan a small set of very educated reviews (assuming cpanratings.perl.org had many reviews on each module, which it does not). Although I would allow a module author to review one's one module, I would not allow a reviewer to rate one's own review ;)

For search engine ranking purposes, there are more automatic ways to rate modules. Ratings could be determined from a number of factors such as the number of modules using the given module, the conformity to measurable quality standards (with respect to the Changelog, README, POD, and test suite), the number of downloads, etc. Ratings can be transitive, so if a module is used by many highly rated modules, then that module will have its ratings upped. This is basically no different from the approach of Internet search engines, in which no manual ratings system is used globally. Automatic rankings are better than nothing, but they will likely not be as good as manual rankings could be, yet I fear the current star ratings are not being maintained in a way that ensures reliability useful for improving searching as suggested. Also, am I correct that search.cpan.org currently ranks only on keywords?

Smylers wrote:
> But yes, as the CGI::Lite maintainer I do have an interest in a review
> of CGI-related modules: I'd like it to put people off using CGI::Lite so
> that I can stop trying to maintain it and everybody can use something
> saner instead ...

Of course, we'd never know that from the current POD and ratings system ;) I've wondered myself whether I should move some CGI programs to CGI::Lite for improved performance.

Rocco Caputo wrote:
> On Tue, Feb 10, 2004 at 08:19:14PM +0000, Smylers wrote:
> ...
>>  * Others can use those requirements to review further modules.
>>
>>  * Somebody could later add another requirement and only has to check
>>    out each of the modules for that to augment the review: it isn't
>>    necessary to start from scratch.
>
>
> I think your last two points are the most important.  Peer review and
> incremental improvement make the review process somewhat collaborative
> and self-correcting.  It should eventually ensure a set of reviews that
> are fair and comprehensive.  Or at least engender some interesting
> arguments.
>

I concur. I see this process as iterative as well, with feedback on modules being made public and slowing being incorporated into a more formal review on an entire class of modules.

-davidm

Re: Finding the module you want

Reply via email to