> > - Better package search mechanism (tags?) allowing free text search > > in package management interfaces: "I want a program that does X" > > Doesn't 'apt-cache search X' do exactly that?
[ Here's the in-depth answer from my POV ] Think of a *end* user that wants to find the most popular multi-user games in Debian (maybe to play with fellow Debianites). You're saying he has to: [ open a terminal in his X session, yes, he could use synaptic but see below ] $ apt-cache search multiuser game [ shows only a few packages, including libraries, which the user is not interested in ] $ apt-cache search multi-user game [ different output due to the keyword change ] $ apt-cache search multi user game [ mix of the above ] $ apt-cache search multi player game [ different packages, shows both library and data packages which he will never install directly ] And in all these cases he still wouldn't be able to tell which ones are the ones other Debian users use most. He would need to feed in the popcon data for that. Scripting anyone? [1] The user in this example really wants to see here: - end-user packages (not library packages or data stuff pulled in through dependencies) - sorted by their popularity (i.e. installated base) - one-click away from installation No package frontend I am aware of can currently pull that stunt. Aptitude or dselect can only search in the package names ('/' key). Synaptic can search in the descriptions (with equivalent results as apt-get). Moreover, text based searches in a free text area are not useful when all words have the same weigth (i.e. no keywords). In order to be able to do proper searches you need to use automatic language analysis algorithms that will add weight to words (like TFDIF [2]). Consider another example: a user wants to find a good mail reader for his graphical environment (he's actually looking for an application like thunderbird or evolution). How should he conduct the search? 'apt-get search mail reader'? That's 77 packages he needs to sort out manually. 'apt-get search mail reader graphic' ? That lists only two packages, neither of which fits his search. Have I made the issue clear now? With all the software we currently have in Debian it's *very* difficult for novice users to find what they are looking for. They end up reading about which tools are good for them elsewhere (i.e. Google) and then look for them in Debian. Instead of searching for them with the tools they have in Debian first. Finally, if you review the above you'll see that I haven't mentioned i18n issues but those, too, are an important issue. Users can use our system fully i18nized except for the Debian package management system itself which is english only. The more software we have, the more difficult it is for users to search in it using the current (crude) tools we provide. Regards Javier [1] Attached to this mail is a script that implements this, you can see that adding popcon data to the search helps but doesn't still cut it since it will still show 'library' and 'data' packages which an end-user will rarely install on their own. [2] I actually implemented this through a hack called 'dpkg-iasearch' which didn't caught up much attention. I didn't have time to work more on it, but it did allow for free text (non-keyword) searches using TFIDF to group the description of packages in clusters.
#!/usr/bin/perl -w # # Popular packages search # (c) 2005 Javier Fernandez-Sanguino # # Run an 'apt-cache search' query and order the packages by popularity. # You first need to retrieve the popcon data, use: # # wget -O all-popcon-results.txt.gz http://popcon.debian.org/all-popcon-results.txt.gz # # Usage: # - popular-packages.pl -p all-popcon-results.txt.gz "my query" # Show all packages with RC bugs sorted by popularity # # -------------------------------------------------------------------------- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # # You can also find a copy of the GNU General Public License at # http://www.gnu.org/licenses/licenses.html#TOCLGPL # # -------------------------------------------------------------------------- $POPCONURL="http://popcon.debian.org/all-popcon-results.txt.gz"; use Getopt::Std; use FileHandle; getopts('hvp:'); # opt_h = print help # opt_p = popularity contest file # opt_v = verbose - currently format popularity_top = Packages sorted by popularity Name Popularity --------------------------------- . format popularity = @<<<<<<<<<<<<<<<<<<<<< @<<<<<<< $package, $popularity{$package} . if ( $opt_h ) { $opt_h = 0; # Shut -w up usage(); exit 0; } my $query=shift; if ( ! defined ($query) || $query eq "" ) { print STDERR "$0: Give me something to search for!\n"; usage(); exit 1; } if ( ! defined ($opt_p) ) { print STDERR "$0: You should provide a popularity contest data file!\n"; usage(); exit 1; } # Use apt-cache search open (QUERY, "apt-cache search $query |") || die ("$0: Cannot run apt-cache: $!"); while (<QUERY>) { # Here we go.... chomp; print STDERR "\tParsing search result: '".$_."'\n" if $opt_v; if ( /^\s*(\S*)\s+-\s+(\S*)/ ) { $package = $1; $description = $2; print STDERR "\tAdding package $package to the list\n" if $opt_v; $packagelist{$package} = $description; $popularity{$package} = 0; } } close QUERY; # Retrieve from # http://people.debian.org/~apenwarr/popcon/all-popcon-results.txt.gz if ( $opt_p ) { $popularity = $opt_p; [ -f $popularity ] || die ("File $popularity does not exist"); open(POPULAR,"zcat -f -c $popularity | ") || die ("Cannot uncompress popularity: $!"); while (<POPULAR>) { # Format is package #Votes #Old #Recent #Unknown chomp; if ( /([\w\-\.]+)\s*(\d+)\s*(\d+)\s*(\d+)\s*(\d+)/ ) { print STDERR "\tPopularity for $1 is $2\n" if $opt_v; $popularity{$1}=$2; } } close POPULAR; } format_name STDOUT "popularity"; format_top_name STDOUT "popularity_top"; foreach $package ( sort { $popularity{$b} <=> $popularity{$a} } keys %popularity) { if ( defined ( $packagelist{$package} ) ) { write } } exit 0; sub usage { print "Usage: $0 -p popcon_data \"text query\"\n"; print "\t-p\tPopularity contest data file\n"; print "\t\tDownload from $POPCONURL\n"; print "\t-v\tBe more verbose\n"; print "\t-h\tShow this help\n"; }
signature.asc
Description: Digital signature