Take a look at my ApacheCon example code at http://www.cnlp.org/apachecon2005. In particular there is some sample code in the file IndexAnalysis.java that demonstrates what Karl is talking about. I don't think it is exactly what you want, but it shows how to get co-occurrence information from the Index. You may be able to use it as a starting point.

karl wettin wrote:
On Wed, 2006-05-10 at 10:26 -0700, Xiaocheng Luan wrote:
Is there any Lucene tools

Not that I know.

(or general tools/algorithms) that can compute the co-occurrence terms
for a given query (or term)?

Might be slow, but you can work the TermFreqVector. It would probably be
best to store this data in an alternative index.

I would start with making it an all in memory index using Maps and hard
links. Then use your favorite object mapping layer to store the
information. Perhaps java.io.Serializable is enough.
Weka is a really nice data mining library. You should post the same question to them, and tell them what you try to achieve with this data. Perhaps they have some really nice classifier for you.
Feel free to report back here.



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



--

Grant Ingersoll Sr. Software Engineer Center for Natural Language Processing Syracuse University School of Information Studies 335 Hinds Hall Syracuse, NY 13244 http://www.cnlp.org Voice: 315-443-5484 Fax: 315-443-6886

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to