Hi All,
I have a Lucene index built with Lucene 4.9 for 584 text documents, I need to
extract a Document-term matrix, and Document Document similarity matrix
in-order to use it to cluster the documents. My questions:1- How can I extract
the matrix and compute the similarity between documents in
Hi all
I have a problem that might be very trivial but I don't know how can I solve it
using Lucene
I created an index with Lucene for a huge data set around 3 million documents
in various domains and another index for a corpus of 30 documents in a specific
domain.for every document in the smal
.
>
>
>
> [1] http://wiki.apache.org/solr/MoreLikeThis
>
>
>
>
> Thanks and Regards,
> S SYED ABDUL KATHER
>
>
>
> On Mon, Jul 30, 2012 at 7:30 PM, Elshaimaa Ali [via Lucene] <
> ml-node+s472066n3998082...@n3.nabble.com> wrote:
>
>
code to decode the XML into
> documents...
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Tue, Jun 19, 2012 at 6:27 PM, Elshaimaa Ali
> wrote:
> >
> > Thanks Mike for the prompt replyDo you have a fully indexed version of the
> > wiki
to fully index it. This is on a fairly beefy machine (24
> cores)... and trunk/4.0 has substantial concurrency improvements over
> 3.x.
>
> You can also try the ideas here:
>
> http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
>
> Mike McCandless
>
> http://b
Hi everybody
I'm using Lucene3.6 to index Wikipedia documents which is over 3 million
article, the data is on a mysql database and it is taking more than 24 hours so
far.Do you know any tips that can speed up the indexing process
here is mycode:
public static void main(String[] args) {