Ben Finney writes ("Re: Removing duplication: Word lists of common words in languages"): > Where is a good authoritative source of such words, by frequency, for > various natural languages, suitable for inclusion in Debian as a data > package?
I had roughly this question in 2013, and found the answer. Here is probably the best starting point: http://www.chiark.greenend.org.uk/ucgi/~ijackson/git?p=evade-mail-usrlocal.git;a=blob;f=lemma.al-permission.mbox Ian. -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/21602.3533.32160.657...@chiark.greenend.org.uk