either
Lucene at ingestion or Mahout at post-processing? The Vector Space Model
seems to be notional similar to PCA or Factor Analysis, which both have
similar ambitions. Thoughts???
Thank you in advance
Regards,
Rich Heimann
Richard Heimann
f you are convinced that length normalization is the culprit you could
> give
> a try to:
> - omitting norms all together at indexing
> - using e.g. SeetSpotSimilarity which do not favor shorter documents.
> Regards,
> Doron
>
> On Thu, May 19, 2011 at 5:20 PM, Rich Heimann
re IDF (in similarity? in
> solrconfig?).
>
> paul
>
>
> Le 18 mai 2011 à 21:30, Rich Heimann a écrit :
>
> > Hello all,
> >
> > This is my first time on the list and my first question...forgive me it
> this
> > has been hacked out in the past.
> &g
Hello all,
This is my first time on the list and my first question...forgive me it this
has been hacked out in the past.
We have set up Lucene/Solr and are getting somewhat spurious results. It
appears to be a result of heterogeneous document sizes. In other words, the
top results are sometimes (