On Fri, Jul 05, 2013 at 05:17:54PM +0100, Jonathan Wakely wrote: > On 5 July 2013 16:43, Ondřej Bílka wrote: > > > > Hi, I ran aspell on comments in gcc. After bit of cleaning a list with > > frequencies is here. It is still relatively noisy and more heuristics > > are needed. > > > > http://kam.mff.cuni.cz/~ondra/gcc_misspells > > > > What we will do with this now? > > It doesn't look very useful yet, clearly "namespace" and "param" are not > errors. We need to teach aspell about these. I am thinking about creating shared wordlist that will gcc developers use. It is mainly logistics problem, I could imagine having shared file on sourceware and using script like this.
scp remote_wordlist wordlist aspell merge english wordlist aspell -m wordlist -p new scp remote_wordlist wordlist # To decrease race conditions. aspell merge wordlist new scp wordlist remote_wordlist > > "acccepted" and "accestor" and "actullay" are real spelling mistakes, > but someone will have to do a grep through the whole tree to see where > they come from, and then ignore all the ones in ChangeLog files. If I could extract score from which aspell determines candidate I can sort them from most likely ones. I tried to write to aspell-user but got no response yet. This touches only comments, not changelogs. Ondra