Hi,
Sorry for my ignorance, how do I obtain AtomicReader from a IndexReader?
I figured above code but it gives me a list of atomic readers.
for (AtomicReaderContext context : reader.leaves()) {
NumericDocValues docValues = context.reader().getNormValues(field);
if (docValues != null)
normValu
On Fri, Feb 6, 2015 at 8:51 AM, Ahmet Arslan wrote:
> Hi Michael,
>
> Thanks for the explanation. I am working with a TREC dataset,
> since it is static, I set size of that array experimentally.
>
> I followed the DefaultSimilarity#lengthNorm method a bit.
>
> If default similarity and no index ti
?
Thanks,
Ahmet
On Friday, February 6, 2015 11:08 AM, Michael McCandless
wrote:
How will you know how large to allocate that array? The within-doc
term freq can in general be arbitrarily large...
Lucene does not directly store the total number of terms in a
document, but it does store it
How will you know how large to allocate that array? The within-doc
term freq can in general be arbitrarily large...
Lucene does not directly store the total number of terms in a
document, but it does store it approximately in the doc's norm value.
Maybe you can use that? Alternatively, yo
Hello Lucene Users,
I am traversing all documents that contains a given term with following code :
Term term = new Term(field, word);
Bits bits = MultiFields.getLiveDocs(reader);
DocsEnum docsEnum = MultiFields.getTermDocsEnum(reader, bits, field,
term.bytes());
while (docsEnum.nextDoc() != Doc
quot;A", "B", "C", "D", "E")
> > How to search documents that contain a number of terms in that list
> > but do not care what terms are.
> > For example, any documents that include any 3 terms in the above list are
> > matched.
> &g
Try BooleanQuery.setMinimumNumberShouldMatch
2010/1/21 Phan The Dai :
> Hi everyone, I need you support with this question:
> Assuming that I have some terms, such as: ("A", "B", "C", "D", "E")
> How to search documents that contain a nu
Hi everyone, I need you support with this question:
Assuming that I have some terms, such as: ("A", "B", "C", "D", "E")
How to search documents that contain a number of terms in that list
but do not care what terms are.
For example, any docume
So not much help here, (I wonder if its because I posted 3 questions in
one day) but Ive made some progress in my understaning.
I understand there is only one norm per field and I think Lucene does no
differentiating between adding the same field a number of times and
adding mutiple text to th
Thanks Felipe, but you are missing the point Artist really doesnt come
into it, my problem is confined to the alias field, forget about artist
its just detailed to give the complete scenario
Paul
Felipe wrote:
You could change the boost of the field artist to be bigger than the
field alias.
You could change the boost of the field artist to be bigger than the field
alias.
field.setBoost(artistBoost);
2010/1/12 Paul Taylor
> Been doing some analysis with Luke (BTW doesnt work with StandardAnalyzer
> since Version field introduced) and discovered a problem with field lenghth
> bo
Been doing some analysis with Luke (BTW doesnt work with
StandardAnalyzer since Version field introduced) and discovered a
problem with field lenghth boosting for me.
I have a document that represents a recording artist (i.e Madonna, The
Beatles ectera) it contains an artist and an alias field
do something approximate outside of Lucene? EG, make
>>>> a TokenFilter that counts how many tokens are produced for each
>>>> field/doc, aggregate & store that yourself, and use it in your
>>>> similarity impl?
>>>>
>>>> Mike
>>&
ust
>>> brainstorming type discussions now.
>>>
>>> You could always do something approximate outside of Lucene? EG, make
>>> a TokenFilter that counts how many tokens are produced for each
>>> field/doc, aggregate & store that yourself, and use it in
kenFilter that counts how many tokens are produced for each
>> field/doc, aggregate & store that yourself, and use it in your
>> similarity impl?
>>
>> Mike
>>
>> On Tue, Dec 15, 2009 at 5:04 AM, kdev wrote:
>>>
>>> any ideas please?
>>> --
ty impl?
>
> Mike
>
> On Tue, Dec 15, 2009 at 5:04 AM, kdev wrote:
>>
>> any ideas please?
>> --
>> View this message in context:
>> http://old.nabble.com/Scoring-formula---Average-number-of-terms-in-IDF
how many tokens are produced for each
field/doc, aggregate & store that yourself, and use it in your
similarity impl?
Mike
On Tue, Dec 15, 2009 at 5:04 AM, kdev wrote:
>
> any ideas please?
> --
> View this message in context:
> http://old.nabble.com/Scoring-formula---Average
any ideas please?
--
View this message in context:
http://old.nabble.com/Scoring-formula---Average-number-of-terms-in-IDF-tp26282578p26792364.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
-
To
Hi,
I want to change the default scoring formula of lucene and one of the
changes I want to perform is on the idf term. What I want to do is to
include the average number of terms of the documents indexed in the
collection in the idf method of the Similarity class.
In order to change the
Hi Chris,
by "number of terms", do you mean the number of different terms that
compose the index, or the numers of total terms, including repetitions?
chris.b escribió:
I'm sure this has been asked a few times before, but i searched and searched
and found no answer (apart
I'm sure this has been asked a few times before, but i searched and searched
and found no answer (apart from using luke), but I would like to know if
there's a way of retrieving the number of terms in an index.
I tried cycling through a TermEnum, but i doesn't do anything :|
--
Vi
Thanks a lot
but one question- IndexOutput class doesn't have a method writeFloat ?
How do u write float to index..
shall i create public method writeFloat as
public void writeFloat(float f) {
writeByte((byte)(f >>32);
writeByte((byte)(f >>16);
writeByte((byte)(f >>8);
writeB
16 okt 2007 kl. 13.07 skrev sandeep chawla:
While calculating the lengthnorm- there is a precision-loss.
http://lucene.apache.org/java/docs/scoring.html#Score%20Boosting
How to avoid the precision loss?
You replace the use of bytes to floats when storing the norms
(DocumentsWriter) in the f
Hi,
While calculating the lengthnorm- there is a precision-loss.
http://lucene.apache.org/java/docs/scoring.html#Score%20Boosting
How to avoid the precision loss?
Thanks
Sandeep
--
SANDEEP CHAWLA
House No- 23
10th main
BTM 1st Stage
Bangalore Mobile: 91-9986150603
Hello,
Is it possible to quickly get the total number of terms from all
documents in an Lucene index for a given field?
For example IndexReader has a method "int numDocs()", I would need a
similar method "int numTerms(String field)".
It looks a bit silly to use IndexReader.t
Paul Elschot wrote:
In case you prefer to use the maximum score over the clauses you
can use the DisjunctionMaxQuery from the development version.
Yes, that may help! I'll need to have a look...
-
To unsubscribe, e-mail: [EMAI
On Tuesday 10 January 2006 07:32, Eric Jain wrote:
> Paul Elschot wrote:
> >>For example, a query for "europe" should rank:
> >>
> >>1. title:"Europe"
> >>2. title:"History of Europe"
> >>3. title:"Travel in Europe, Middle East and Africa"
> >>4. subtitle:"Fairy Tales from Europe"
> >
> > Perhaps
e.org
Betreff: Re: Scoring by number of terms in field
Paul Elschot wrote:
>>For example, a query for "europe" should rank:
>>
>>1. title:"Europe"
>>2. title:"History of Europe"
>>3. title:"Travel in Europe, Middle East and Africa
Paul Elschot wrote:
For example, a query for "europe" should rank:
1. title:"Europe"
2. title:"History of Europe"
3. title:"Travel in Europe, Middle East and Africa"
4. subtitle:"Fairy Tales from Europe"
Perhaps with this query (assuming the default implicit OR):
title:europe subtitle:europe^
Sorry for the quick reply, but yes you can accomplish this by
tweaking a custom Similarity implementation (or DefaultSimilarity
subclass). Check out IndexSearcher.explain on a query and a document
and then tinker.
Erik
On Jan 9, 2006, at 4:34 AM, Eric Jain wrote:
Lucene seems to
On Monday 09 January 2006 10:34, Eric Jain wrote:
> Lucene seems to prefer matches in shorter documents. Is it possible to
> influence the scoring mechanism to have matches in shorter fields score
> higher instead?
A query is always in at least one field of a document.
>
> For example, a query
Lucene seems to prefer matches in shorter documents. Is it possible to
influence the scoring mechanism to have matches in shorter fields score
higher instead?
For example, a query for "europe" should rank:
1. title:"Europe"
2. title:"History of Europe"
3. title:"Travel in Europe, Middle East a
32 matches
Mail list logo