Re: getting number of terms in a document/field

2015-02-08 Thread Ahmet Arslan
Hi, Sorry for my ignorance, how do I obtain AtomicReader from a IndexReader? I figured above code but it gives me a list of atomic readers. for (AtomicReaderContext context : reader.leaves()) { NumericDocValues docValues = context.reader().getNormValues(field); if (docValues != null) normValu

Re: getting number of terms in a document/field

2015-02-06 Thread Michael McCandless
On Fri, Feb 6, 2015 at 8:51 AM, Ahmet Arslan wrote: > Hi Michael, > > Thanks for the explanation. I am working with a TREC dataset, > since it is static, I set size of that array experimentally. > > I followed the DefaultSimilarity#lengthNorm method a bit. > > If default similarity and no index ti

Re: getting number of terms in a document/field

2015-02-06 Thread Ahmet Arslan
? Thanks, Ahmet On Friday, February 6, 2015 11:08 AM, Michael McCandless wrote: How will you know how large to allocate that array? The within-doc term freq can in general be arbitrarily large... Lucene does not directly store the total number of terms in a document, but it does store it

Re: getting number of terms in a document/field

2015-02-06 Thread Michael McCandless
How will you know how large to allocate that array? The within-doc term freq can in general be arbitrarily large... Lucene does not directly store the total number of terms in a document, but it does store it approximately in the doc's norm value. Maybe you can use that? Alternatively, yo

getting number of terms in a document/field

2015-02-05 Thread Ahmet Arslan
Hello Lucene Users, I am traversing all documents that contains a given term with following code : Term term = new Term(field, word); Bits bits = MultiFields.getLiveDocs(reader); DocsEnum docsEnum = MultiFields.getTermDocsEnum(reader, bits, field, term.bytes()); while (docsEnum.nextDoc() != Doc

Re: A interesting question (search by number of terms)

2010-01-21 Thread Phan The Dai
quot;A", "B", "C", "D", "E") > > How to search documents that contain a number of terms in that list > > but do not care what terms are. > > For example, any documents that include any 3 terms in the above list are > > matched. > &g

Re: A interesting question (search by number of terms)

2010-01-21 Thread Benjamin Heilbrunn
Try BooleanQuery.setMinimumNumberShouldMatch 2010/1/21 Phan The Dai : > Hi everyone, I need you support with this question: > Assuming that I have some terms, such as: ("A", "B", "C", "D", "E") > How to search documents that contain a nu

A interesting question (search by number of terms)

2010-01-21 Thread Phan The Dai
Hi everyone, I need you support with this question: Assuming that I have some terms, such as: ("A", "B", "C", "D", "E") How to search documents that contain a number of terms in that list but do not care what terms are. For example, any docume

Re: Is there any difference in a document between one added field with a number of terms and a field added a number of times ?

2010-01-13 Thread Paul Taylor
So not much help here, (I wonder if its because I posted 3 questions in one day) but Ive made some progress in my understaning. I understand there is only one norm per field and I think Lucene does no differentiating between adding the same field a number of times and adding mutiple text to th

Re: Is there any difference in a document between one added field with a number of terms and a field added a number of times ?

2010-01-12 Thread Paul Taylor
Thanks Felipe, but you are missing the point Artist really doesnt come into it, my problem is confined to the alias field, forget about artist its just detailed to give the complete scenario Paul Felipe wrote: You could change the boost of the field artist to be bigger than the field alias.

Re: Is there any difference in a document between one added field with a number of terms and a field added a number of times ?

2010-01-12 Thread Felipe
You could change the boost of the field artist to be bigger than the field alias. field.setBoost(artistBoost); 2010/1/12 Paul Taylor > Been doing some analysis with Luke (BTW doesnt work with StandardAnalyzer > since Version field introduced) and discovered a problem with field lenghth > bo

Is there any difference in a document between one added field with a number of terms and a field added a number of times ?

2010-01-12 Thread Paul Taylor
Been doing some analysis with Luke (BTW doesnt work with StandardAnalyzer since Version field introduced) and discovered a problem with field lenghth boosting for me. I have a document that represents a recording artist (i.e Madonna, The Beatles ectera) it contains an artist and an alias field

Re: Scoring formula - Average number of terms in IDF

2009-12-18 Thread Michael McCandless
do something approximate outside of Lucene?  EG, make >>>> a TokenFilter that counts how many tokens are produced for each >>>> field/doc, aggregate & store that yourself, and use it in your >>>> similarity impl? >>>> >>>> Mike >>&

Re: Scoring formula - Average number of terms in IDF

2009-12-18 Thread kdev
ust >>> brainstorming type discussions now. >>> >>> You could always do something approximate outside of Lucene? EG, make >>> a TokenFilter that counts how many tokens are produced for each >>> field/doc, aggregate & store that yourself, and use it in

Re: Scoring formula - Average number of terms in IDF

2009-12-17 Thread Michael McCandless
kenFilter that counts how many tokens are produced for each >> field/doc, aggregate & store that yourself, and use it in your >> similarity impl? >> >> Mike >> >> On Tue, Dec 15, 2009 at 5:04 AM, kdev wrote: >>> >>> any ideas please? >>> --

Re: Scoring formula - Average number of terms in IDF

2009-12-17 Thread kdev
ty impl? > > Mike > > On Tue, Dec 15, 2009 at 5:04 AM, kdev wrote: >> >> any ideas please? >> -- >> View this message in context: >> http://old.nabble.com/Scoring-formula---Average-number-of-terms-in-IDF

Re: Scoring formula - Average number of terms in IDF

2009-12-17 Thread Michael McCandless
how many tokens are produced for each field/doc, aggregate & store that yourself, and use it in your similarity impl? Mike On Tue, Dec 15, 2009 at 5:04 AM, kdev wrote: > > any ideas please? > -- > View this message in context: > http://old.nabble.com/Scoring-formula---Average

Re: Scoring formula - Average number of terms in IDF

2009-12-15 Thread kdev
any ideas please? -- View this message in context: http://old.nabble.com/Scoring-formula---Average-number-of-terms-in-IDF-tp26282578p26792364.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To

Scoring formula - Average number of terms in IDF

2009-11-10 Thread kdev
Hi, I want to change the default scoring formula of lucene and one of the changes I want to perform is on the idf term. What I want to do is to include the average number of terms of the documents indexed in the collection in the idf method of the Similarity class. In order to change the

Re: Retrieve number of terms

2008-01-10 Thread Luis Rodrigo
Hi Chris, by "number of terms", do you mean the number of different terms that compose the index, or the numers of total terms, including repetitions? chris.b escribió: I'm sure this has been asked a few times before, but i searched and searched and found no answer (apart

Retrieve number of terms

2008-01-10 Thread chris.b
I'm sure this has been asked a few times before, but i searched and searched and found no answer (apart from using luke), but I would like to know if there's a way of retrieving the number of terms in an index. I tried cycling through a TermEnum, but i doesn't do anything :| -- Vi

Re: Number of terms

2007-10-16 Thread sandeep chawla
Thanks a lot but one question- IndexOutput class doesn't have a method writeFloat ? How do u write float to index.. shall i create public method writeFloat as public void writeFloat(float f) { writeByte((byte)(f >>32); writeByte((byte)(f >>16); writeByte((byte)(f >>8); writeB

Re: Number of terms

2007-10-16 Thread Karl Wettin
16 okt 2007 kl. 13.07 skrev sandeep chawla: While calculating the lengthnorm- there is a precision-loss. http://lucene.apache.org/java/docs/scoring.html#Score%20Boosting How to avoid the precision loss? You replace the use of bytes to floats when storing the norms (DocumentsWriter) in the f

Number of terms

2007-10-16 Thread sandeep chawla
Hi, While calculating the lengthnorm- there is a precision-loss. http://lucene.apache.org/java/docs/scoring.html#Score%20Boosting How to avoid the precision loss? Thanks Sandeep -- SANDEEP CHAWLA House No- 23 10th main BTM 1st Stage Bangalore Mobile: 91-9986150603

how to get the number of terms in an index

2006-06-03 Thread Roxana Angheluta
Hello, Is it possible to quickly get the total number of terms from all documents in an Lucene index for a given field? For example IndexReader has a method "int numDocs()", I would need a similar method "int numTerms(String field)". It looks a bit silly to use IndexReader.t

Re: Scoring by number of terms in field

2006-01-10 Thread Eric Jain
Paul Elschot wrote: In case you prefer to use the maximum score over the clauses you can use the DisjunctionMaxQuery from the development version. Yes, that may help! I'll need to have a look... - To unsubscribe, e-mail: [EMAI

Re: Scoring by number of terms in field

2006-01-10 Thread Paul Elschot
On Tuesday 10 January 2006 07:32, Eric Jain wrote: > Paul Elschot wrote: > >>For example, a query for "europe" should rank: > >> > >>1. title:"Europe" > >>2. title:"History of Europe" > >>3. title:"Travel in Europe, Middle East and Africa" > >>4. subtitle:"Fairy Tales from Europe" > > > > Perhaps

AW: Scoring by number of terms in field

2006-01-10 Thread Stefan Gusenbauer
e.org Betreff: Re: Scoring by number of terms in field Paul Elschot wrote: >>For example, a query for "europe" should rank: >> >>1. title:"Europe" >>2. title:"History of Europe" >>3. title:"Travel in Europe, Middle East and Africa

Re: Scoring by number of terms in field

2006-01-09 Thread Eric Jain
Paul Elschot wrote: For example, a query for "europe" should rank: 1. title:"Europe" 2. title:"History of Europe" 3. title:"Travel in Europe, Middle East and Africa" 4. subtitle:"Fairy Tales from Europe" Perhaps with this query (assuming the default implicit OR): title:europe subtitle:europe^

Re: Scoring by number of terms in field

2006-01-09 Thread Erik Hatcher
Sorry for the quick reply, but yes you can accomplish this by tweaking a custom Similarity implementation (or DefaultSimilarity subclass). Check out IndexSearcher.explain on a query and a document and then tinker. Erik On Jan 9, 2006, at 4:34 AM, Eric Jain wrote: Lucene seems to

Re: Scoring by number of terms in field

2006-01-09 Thread Paul Elschot
On Monday 09 January 2006 10:34, Eric Jain wrote: > Lucene seems to prefer matches in shorter documents. Is it possible to > influence the scoring mechanism to have matches in shorter fields score > higher instead? A query is always in at least one field of a document. > > For example, a query

Scoring by number of terms in field

2006-01-09 Thread Eric Jain
Lucene seems to prefer matches in shorter documents. Is it possible to influence the scoring mechanism to have matches in shorter fields score higher instead? For example, a query for "europe" should rank: 1. title:"Europe" 2. title:"History of Europe" 3. title:"Travel in Europe, Middle East a