Document Term matrix

2014-11-11 Thread Elshaimaa Ali
Hi All, I have a Lucene index built with Lucene 4.9 for 584 text documents, I need to extract a Document-term matrix, and Document Document similarity matrix in-order to use it to cluster the documents. My questions:1- How can I extract the matrix and compute the similarity between documents in

comparing documents in 2 indexes

2012-11-15 Thread Elshaimaa Ali
Hi all I have a problem that might be very trivial but I don't know how can I solve it using Lucene I created an index with Lucene for a huge data set around 3 million documents in various domains and another index for a corpus of 30 documents in a specific domain.for every document in the smal

RE: Document Similarity

2012-07-30 Thread Elshaimaa Ali
. > > > > [1] http://wiki.apache.org/solr/MoreLikeThis > > > > > Thanks and Regards, > S SYED ABDUL KATHER > > > > On Mon, Jul 30, 2012 at 7:30 PM, Elshaimaa Ali [via Lucene] < > ml-node+s472066n3998082...@n3.nabble.com> wrote: > >

RE: Wikipedia Index

2012-06-19 Thread Elshaimaa Ali
code to decode the XML into > documents... > > Mike McCandless > > http://blog.mikemccandless.com > > On Tue, Jun 19, 2012 at 6:27 PM, Elshaimaa Ali > wrote: > > > > Thanks Mike for the prompt replyDo you have a fully indexed version of the > > wiki

RE: Wikipedia Index

2012-06-19 Thread Elshaimaa Ali
to fully index it. This is on a fairly beefy machine (24 > cores)... and trunk/4.0 has substantial concurrency improvements over > 3.x. > > You can also try the ideas here: > > http://wiki.apache.org/lucene-java/ImproveIndexingSpeed > > Mike McCandless > > http://b

Wikipedia Index

2012-06-19 Thread Elshaimaa Ali
Hi everybody I'm using Lucene3.6 to index Wikipedia documents which is over 3 million article, the data is on a mysql database and it is taking more than 24 hours so far.Do you know any tips that can speed up the indexing process here is mycode: public static void main(String[] args) {