Re: (Lucene) tools/algorithms for co-occurrence terms computation

Grant Ingersoll Wed, 10 May 2006 11:37:08 -0700

Take a look at my ApacheCon example code athttp://www.cnlp.org/apachecon2005. In particular there is some samplecode in the file IndexAnalysis.java that demonstrates what Karl istalking about. I don't think it is exactly what you want, but it showshow to get co-occurrence information from the Index. You may be able touse it as a starting point.


karl wettin wrote:

On Wed, 2006-05-10 at 10:26 -0700, Xiaocheng Luan wrote:

Is there any Lucene tools


Not that I know.

(or general tools/algorithms) that can compute the co-occurrence terms
for a given query (or term)?


Might be slow, but you can work the TermFreqVector. It would probably be
best to store this data in an alternative index.

I would start with making it an all in memory index using Maps and hard
links. Then use your favorite object mapping layer to store the
information. Perhaps java.io.Serializable is enough.

Weka is a really nice data mining library. You should post the samequestion to them, and tell them what you try to achieve with this data.Perhaps they have some really nice classifier for you.

Feel free to report back here.



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

--

Grant IngersollSr. Software EngineerCenter for Natural Language ProcessingSyracuse UniversitySchool of Information Studies335 Hinds HallSyracuse, NY 13244http://www.cnlp.orgVoice: 315-443-5484Fax: 315-443-6886


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: (Lucene) tools/algorithms for co-occurrence terms computation

Reply via email to