Hi Umesh,
> I am trying to put the problem more concisely.
> 1. Fields where term frequency is very very relevant. E.g.
> Body:
> Example:
>if TF of badger in Body of doc 1 > TF of badger in Body of doc 2
> doc 1 scores higher.
>
> 2. Fields where term frequency is irrevalent
>
Hi.
Such 'autocompletion' features with Lucene could be provided with n-gram
tokenizers, as Erick states. I made a 'Bigram' analyzer for my master
thesis, when I was doing some research on how to enhance phrase
searching. This Analyzer considers pair of words as single terms.
Basically, what the
Let's please don't forget the scoring function. Yes, *query* is important,
however, everyone in IR knows that two different scoring functions may
return two different sets of results for the same query!
David, I think you have to be more explicit here. What exactly are you
trying to do? Are you g
Dear All,
Thanks for your feedback.
I want to do research on how lucene performs compared to Latent Semantic
analysis in terms of recall and precision.
I welcome ideas on this,does anyone know a software tool using latent semantic
analysis that I could also download and try it?At the moment I am
Hi,
I know it is very easy to get the frequency of a given term using the
indexReader but I am looking to perform an index search and would like to get
the frequency of the given term in the result set. Is this possible?
Thanks in advance,
Paul
---
I don't think this question makes a whole lot of sense in isolation--
precision and recall is all about the *query* and that is the art of the
developer; what is the appropriate query for your particular application.
Lucene does just great telling you which documents had which terms and
which t
I am not aware of any open source LSA framework out there. If you are
interested in PLSA, Lemur has got an implementation.
In a "simplest" sense Lucene is using a type of TFIDF scoring mechanism.
If you are not really concerned with Lucene's particular implementation,
then just use Lemur for your
Hi Paul,
I am tempted to suggest the following ( I am assuming here that the
document and the particular fields are TFVed when indexing):
For every doc in the result set:
- get the doc id
- using the doc id, get the TermFreqVector of this document from the
index reader (tfv=ireader.getTermFr
On Jan 15, 2009, at 8:43 AM, Murat Yakici wrote:
I am not aware of any open source LSA framework out there. If you are
interested in PLSA, Lemur has got an implementation.
In a "simplest" sense Lucene is using a type of TFIDF scoring
mechanism.
If you are not really concerned with Lucene's
I just ran into this
http://www.compass-project.org/docs/2.0.0/reference/html/needle-terracot
ta.html and was wondering if any of you had tried anything like this and
if so, what your experience was like.
Eric
There is a discussion here:
http://www.terracotta.org/web/display/orgsite/Lucene+Integration
Also of interest: "Katta - distribute lucene indexes in a grid"
http://katta.wiki.sourceforge.net/
-glen
http://zzzoot.blogspot.com/2008/11/lucene-231-vs-24-benchmarks-using-lusql.html
http://zzzoot.blo
>First, it's a legitimate question whether matching on single-letter
>prefixes is useful for the user. If you're running into TooManyClauses,
>that means (if you haven't changed the defaults) that there are more
>than 1024 possibilities. Which is far too many for the user to scan
through.
That is
Thanks for your input. I will try and apply your suggestion.
Thanks,
Peter
-Original Message-
From: Asbjørn A. Fellinghaug [mailto:asbj...@fellinghaug.com]
Sent: Thursday, January 15, 2009 3:25 AM
To: java-user@lucene.apache.org
Subject: Re: Google finance-like suggestible search field
: The question I'm trying to phrase is: Is there a way to make the rank of
: SHOULD term conditional?
:
: In the example, I'm trying to express "If the term MEDICAL is found, the
: term CAT ranks high; if the term ANIMAL is found, the term CAT ranks low."
except that there is an ambiguous si
: This is not quite what I was talking about. I was talking about documents
: with a single field. I want the text "Badgers are mammals. Badgers are cute"
: to score higher than the text "Badger Badger" for the term query
: "text:badger".
: Ideally, what I want is to add another factor to the scor
Hi,
I forgot to thank everyone who replied. It seems that caching the
IndexSearcher (properly :-) did the trick in terms of more
deterministic memory usage... and more importantly giving a
substantial performance boost.
Did lots of other optimization of the queries (using rangefilter
ra
16 matches
Mail list logo