Take a look at my ApacheCon example code at
http://www.cnlp.org/apachecon2005. In particular there is some sample
code in the file IndexAnalysis.java that demonstrates what Karl is
talking about. I don't think it is exactly what you want, but it shows
how to get co-occurrence information from the Index. You may be able to
use it as a starting point.
karl wettin wrote:
On Wed, 2006-05-10 at 10:26 -0700, Xiaocheng Luan wrote:
Is there any Lucene tools
Not that I know.
(or general tools/algorithms) that can compute the co-occurrence terms
for a given query (or term)?
Might be slow, but you can work the TermFreqVector. It would probably be
best to store this data in an alternative index.
I would start with making it an all in memory index using Maps and hard
links. Then use your favorite object mapping layer to store the
information. Perhaps java.io.Serializable is enough.
Weka is a really nice data mining library. You should post the same
question to them, and tell them what you try to achieve with this data.
Perhaps they have some really nice classifier for you.
Feel free to report back here.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--
Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
School of Information Studies
335 Hinds Hall
Syracuse, NY 13244
http://www.cnlp.org
Voice: 315-443-5484
Fax: 315-443-6886
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]