Hi, I am trying to use mutual information to find the correlation between different terms in documents. But for millions of documents the speed is too slow to calculate the mutual information. Any body have build a high performance solutions for this ? I found the article below in the former mail list, but no detail information provided: http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200407.mbox/[EMAIL PROTECTED]
Any hints papers or even some key words to help me to search in Google are welcome! Many thanks and bow!! Thanks -Qi