Hi, Currently we are using the in-built Porter stemming tokenizer of SQLite, which by default stems all the keywords while indexing. It does this by removing suffixes like 's', 'es', 'ing', 'ed' from the end of the words and various other similar heuristics. This is useful for full text search because if you search for 'directories', you will find matches for both 'directory' and 'directories'.
But the downside is that technical keywords (e.g. kms, lfs, ffs), are also stemmed down and stored (e.g. km, lf, ff) in the index. So if you search for kms, you will see results for both kms and km. The solution is to write a custom tokenizer where we check in an ignore list to decide whether to stem a token or not. I'm looking how best to obtain this ignore list of keywords. The discussion on current-users [1] had two suggestions: 1. If a word is not in /usr/share/dict/words, don't stem. 2. Look for .Tn macros (and probably other similar macros) and don't stem those. Doing (1) is simple but that file is huge and it would require building a huge hash table to search in it for ever keyword while parsing the man pages. With (2), the list will not be available before makemandb(8) runs, so it is hard to implement. There is another option of building a list by hand and by using /usr/data/src/usr.bin/spell/spell/{special.netbsd, special.math} as a starting point. If you have any better alternatives, please let me know :) [1]: http://mail-index.netbsd.org/current-users/2016/07/08/msg029732.html - Abhinav