Jason, You can look here:
http://www.cs.ualberta.ca/~lindek/downloads.htm for Word frequency counts from a 1.5B word corpus (TREC disks 1-5 and the Reuters corpus <http://about.reuters.com/researchandstandards/corpus/>). The words are normalized as follows: ALL CAP words are prepended with a_ and Capitalized words are prepended with c_ after downcasing. Digits are all replaced with 0. Cheers, Boris On 8/30/06, Jason Pump <[EMAIL PROTECTED]> wrote:
Is there a large list of words and their frequency in the english language? Obviously it would differ by corpus but I would like to see what's already available. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
-- Thanks, Boris