-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hash: SHA1
Hi Any ideas on how to implement this are appreaciated: Frequency Analysis of English Vocabulary and Grammar: Based on the LOB Corpus by Stig Johansson and Knut Hofland (OUP, 1989, ISBN 0-19-8242212-2) gives the top eighteen words and their frequencies as: 1. the 68315 2. of 35716 3. and 27856 4. to 26760 5. a 22744 6. in 21108 7. that 11188 8. is 10978 9. was 10499 10. it 10010 11. for 9299 12. he 8776 13. as 7337 14. with 7197 15. be 7186 16. on 7027 17. I 6696 18. his 6266 If the body contains http: ftp: or https: link, I want to test it further; otherwise, skip this test. The test is as follows: Check each paragraph that does not contain any of the above 18 words (paragraphs seperated by \n). 1. For each para without common English words, assign a score. 2. For each para containing words with 0-9, ', " (anywhere), : and ~ (middle), assign score based on number of matches Thanks Murty Rompalli -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (GNU/Linux) iD8DBQFB1h6bqbgVhXQ+7mURAtafAKC++FtF6OZIkHC2hVD90509VTgFVwCfZPSw wVqnkz5XYQOG8ZBGa8Pvow4= =oON4 -----END PGP SIGNATURE-----