How to eliminate near duplicates from the index? Someone suggested that I
could look at the TermVectors and do a comparision to remove the duplicates.
One major problem with this is the structure of the document is no longer
important. Are there any obvious pitfalls? For example: Document A being
:
>>
>> 17 okt 2006 kl. 17.54 skrev Find Me:
>>
>>> How to eliminate near duplicates from the index?
>>
>> I would probably try to measure the Ecludian distance between all
>> documents, computed on terms and their positions. Or perhaps use
>> stan
On 12/11/06, Waheed Mohammed <[EMAIL PROTECTED]> wrote:
Hello,
Is there a way to influence lucene's generation of ids while indexing.
my requirement is. I want to have different indexes where no index should
have
ids that have been assigned to an index earlier.
for instance
IDX1 : {0.1
On 1/2/07, sdeck <[EMAIL PROTECTED]> wrote:
Thanks for advanced on any insight on this one.
I have a fairly large query to run, and it takes roughly 20-40 seconds to
complete the way that i have it.
here is the best example I can give.
I have a set of roughly 25K documents indexed
I have que
I was trying to print out the score explanation by a DisjunctionMaxQuery.
Though there is a hit score > 0 for the results, there is no detailed
explanation. Am I doing something wrong?
In the following output, each hit has two lines. The first line is the hit
score and the second line is the expl
public void explainSearchScore(String indexLocation, DisjunctionMaxQuery
disjunctQuery){
IndexSearcher searcher = new IndexSearcher(IndexReader.open
(indexLocation));
Hits hits = searcher.search(disjunctQuery);
if(hits == null) return;
for(int i = 0; i < hits.leng
);
if(hits == null) return;
for(int i = 0; i < hits.length(); i++){
System.out.println("Hit " + i + ": " + hits.score(i) +
"\n" + searcher.explain(disjunctQuery, i).toString());
}
}
Find Me wrote:
public void explai
For:
BooleanQuery bQuery=new BooleanQuery();
bQuery.add(messageQuery,true,false)
Use:
BooleanQuery bQuery=new BooleanQuery();
bQuery.add(messageQuery, BooleanClause.Occur.MUST);
Mapping is as follows:
For add(query, true, false) use add(query, BooleanClause.Occur.MUST)
For add(query, false, fal