(forwarding this to ubuntu-devel-discuss and Zygmunt) On Mon, Nov 13, 2017 at 10:33:39PM -0800, Shawn Landden wrote: > Package: command-not-found > Severity: wishlist > > I re-wrote command-not-found to get rid of the python dependancy, and > to reduce the database size, as to reduce memory usage. > > https://github.com/shawnl/command-not-found > > I was preparing to upload it to mentors as command-not-found-ng
I also rewrote it years ago, but using the same database format, just in C. It was a lot faster. I don't understand the memory usage bit - it should not matter how large the database is, it's memory mapped, and not read into memory, as such memory usage should be roughly constant. Questions/Comments for your approach: * Did you test your format on a slow HDD with caches dropped? It must not be slower than the Python one (that one is way too slow already) - I did, it seems to be faster (0.4 vs 0.68 seconds) - I believe the database-based C rewrite was even much faster, though. * update-command-not-found should use apt-get indextargets * You don't store components, hence you cannot tell people to enable component. That's a very important use case for Ubuntu, where not all components are enabled by default, but the database is shipped in the package. You could just append /<component> to each package name I think, and strip it away when displaying. * You should use getopt_long() to parse command-line options, and support -h, --help :) * pts_lbsearch belongs into usr/lib/..., not usr/share/... * You don't implement a closest matches function: $ command-not-found thunderbrd No command 'thunderbrd' found, did you mean: Command 'thunderbird' from package 'thunderbird' (main) thunderbrd: command not found $ ./command-not-found thunderbrd thunderbrd: command not found This one is really important. People do make typos or misremember command names, so the tool needs to be able to deal with that Should be easy to implement though, although you might have to search multiple times - once for each alternative. All you need is def similar_words(word): """ return a set with spelling1 distance alternative spellings based on http://norvig.com/spell-correct.html""" alphabet = 'abcdefghijklmnopqrstuvwxyz-_' s = [(word[:i], word[i:]) for i in range(len(word) + 1)] deletes = [a + b[1:] for a, b in s if b] transposes = [a + b[1] + b[0] + b[2:] for a, b in s if len(b)>1] replaces = [a + c + b[1:] for a, b in s for c in alphabet if b] inserts = [a + c + b for a, b in s for c in alphabet] return set(deletes + transposes + replaces + inserts) And search for what that returns. And you don't need to search for those at all if you have a direct match. * It needs to be translated - also very important. * You need to Conflict with command-not-found and not Break AFAIUI * You should not depend on grep, sed, coreutils, they are Essential. * You do have to Depend on apt-file, as that configures apt to download the Contents files * You should not have identifiers starting with _ in the program, these are reserved for the C implementation (like _cleanup_free_). Yes, and these are basically the same reasons my C prototype is not in the archive. Also, I did not put a lot of work into it, as I was waiting for PackageKit to take that over, but that was not done yet. I think it's a worthwhile approach, and I can see it replacing command-not-found if those tiny issues have been fixed. Then you could also avoid the -ng moniker, and just take over the main package (if Zygmunt does not mind), which also avoids a month long NEW process :) -- Debian Developer - deb.li/jak | jak-linux.org - free software dev Ubuntu Core Developer de, en speaker -- Ubuntu-devel-discuss mailing list Ubuntu-devel-discuss@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel-discuss