Hi,
I have the same question related to LMJelinekMercerSimiliarity class.
protected float score(BasicStats stats, float freq, float docLen) {
return stats.getTotalBoost() *
(float)Math.log(1 + ((1 - lambda) * freq / docLen) / (lambda *
((LMStats)stats).getCollectionProbability()));
Hi,
any news since?
Thanks,
Best regards,
ZP
--
View this message in context:
http://lucene.472066.n3.nabble.com/pruning-Lucene-4-0-tp4013363p4041499.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
--
Hi to all,
I started to use benchmark 4.0 to create submission report files with the
following code:
BufferedReader br = new BufferedReader(fr);
QualityQuery qqs[] = qReader.readQueries(br);
QualityQueryParser qqParser = new SimpleQQParser("title", "body");
Hi,
Do you have any information about when the pruning package will be available
for Lucene 4.0 ?
Best Regards
Thanks in advance
ZP
--
View this message in context:
http://lucene.472066.n3.nabble.com/pruning-Lucene-4-0-tp4013363.html
Sent from the Lucene - Java Users mailing list archive a
Hi to all,
I used pruning package with LA Times collection. The initial LA Times index
is created by lucene benchmark/conf/*.alg. Luke shows 131896 documents with
635614 terms for initial index. I pruned with CarmelTopKPruning policy with
epsilon = 0.1 by varying k. However, my results do not cor
Hi to all,
I found the problem and the solution. In PruningReader
super.getSequentialSubReaders(); is used. After 28118 super.next() is false
because it is a subreader for a segment and indexreader.maxDoc() is equal to
28118 for that segment. In pruneAllPositions, instead of comparing
termpostion
Hi to all,
In pruning package, for pruneAllPositions(TermPositions termPositions, Term
t) methos it is said that :
"termPositions - positioned term positions. Implementations MUST NOT advance
this by calling TermPositions methods that advance either the position
pointer (next, skipTo) or term poi
Hi,
In CarmelTopKTermPruningPolicy class, the threshold is calculated as
follows:
*float threshold = docs[k - 1].score - scoreDelta;*
docs[k - 1].score corresponds to z_t in the original paper (Carmel et al
2001) and scoreDelta = epsilon * r
Could you please explain me why it is calculated
Hi,
Thanks for your fix. I used it but I think there is something wrong with the
fix!!? because
I am using LATimes collection and with epsilon = 0.1 and k =10 I got 97%
pruned index. It means 3% of index left unchanged after pruning. In the the
original paper, "Static index pruning for IR systems
Hi,
You can use kendall's tau. An article titled Comparing top k lists by Ronald
Fagin, Ravi Kumar and D. Sivakumar explaines different methods.
Best Regards,
ZP
--
View this message in context:
http://lucene.472066.n3.nabble.com/Measuring-precision-and-recall-in-lucene-to-compare-two-sets-
Thanks for the link. I reviewed it.
Here are more details about the exception:
I used contrib/benchmark/conf/wikipedia.alg to index wikipedia dump with
MAddDocs: 20. I wanted to index only a specific period of time so I
added an if statement in doLogic of AddDocTask class.
I tried to prune t
Hi,
In the pruning package, pruneAllPositions throws an exception. In the code
it is commented that it should not happen.
// should not happen!
throw new IOException("termPositions.doc > docs[docsPos].doc");
Can you please explain me why it happens and what should I do to fix it?
Thanks in a
Hi,
Thanks for the fix.
I also wonder if you know any collection (free ones) to test pruning
approaches. Almost all the papers use TREC collections which I don't have!!
For now, I use Reuters21578 collection and Carmel's Kendall's tau extension
to measure similarity. But I need a collection with
wikipedia.alg in benchmark is only able to extract and index current pages
dumps. It does not take revisions into account. Do you know any way to do
this? Or should I change EnwikiContentSource to handle the versions?
Although, Wikipedia dumps are widely used especially for research purposes,
as f
While using the pruning package, I realised that ridf is calculated in
RIDFTermPruningPolicy as follows:
Math.log(1 - Math.pow(Math.E, termPositions.freq() / maxDoc)) - df
However, according to the original paper (Blanco et al.) for residual idf,
it should be -log(df/D) + log (1 - e^(*-*tf/D)). T
That is perfect
Thank you very much
Best regards
ZP
--
View this message in context:
http://lucene.472066.n3.nabble.com/delete-entries-from-posting-list-Lucene-4-0-tp3838649p3839095.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
-
I need to delete entries from posting list. How to do it in Lucene 4.0? I
need to do this to test different pruning algorithms.
Thanks in advance
ZP
--
View this message in context:
http://lucene.472066.n3.nabble.com/delete-entries-from-posting-list-Lucene-4-0-tp3838649p3838649.html
Sent from
Hi,
I am having a weird experience. I made a few changes with the source code
(Lucene 3.3). I created a basic application to test it. First, I added
Lucene 3.3 project to basic project as "required projects on the build path"
to be able to debug. When everything was ok, I removed it from required
18 matches
Mail list logo