subject:"Number of terms"

Re: getting number of terms in a document/field

2015-02-08 Thread Ahmet Arslan

Hi, Sorry for my ignorance, how do I obtain AtomicReader from a IndexReader? I figured above code but it gives me a list of atomic readers. for (AtomicReaderContext context : reader.leaves()) { NumericDocValues docValues = context.reader().getNormValues(field); if (docValues != null) normValu

Re: getting number of terms in a document/field

2015-02-06 Thread Michael McCandless

On Fri, Feb 6, 2015 at 8:51 AM, Ahmet Arslan wrote: > Hi Michael, > > Thanks for the explanation. I am working with a TREC dataset, > since it is static, I set size of that array experimentally. > > I followed the DefaultSimilarity#lengthNorm method a bit. > > If default similarity and no index ti

Re: getting number of terms in a document/field

2015-02-06 Thread Ahmet Arslan

? Thanks, Ahmet On Friday, February 6, 2015 11:08 AM, Michael McCandless wrote: How will you know how large to allocate that array? The within-doc term freq can in general be arbitrarily large... Lucene does not directly store the total number of terms in a document, but it does store it

Re: getting number of terms in a document/field

2015-02-06 Thread Michael McCandless

How will you know how large to allocate that array? The within-doc term freq can in general be arbitrarily large... Lucene does not directly store the total number of terms in a document, but it does store it approximately in the doc's norm value. Maybe you can use that? Alternatively, yo

getting number of terms in a document/field

2015-02-05 Thread Ahmet Arslan

Hello Lucene Users, I am traversing all documents that contains a given term with following code : Term term = new Term(field, word); Bits bits = MultiFields.getLiveDocs(reader); DocsEnum docsEnum = MultiFields.getTermDocsEnum(reader, bits, field, term.bytes()); while (docsEnum.nextDoc() != Doc

Re: A interesting question (search by number of terms)

2010-01-21 Thread Phan The Dai

quot;A", "B", "C", "D", "E") > > How to search documents that contain a number of terms in that list > > but do not care what terms are. > > For example, any documents that include any 3 terms in the above list are > > matched. > &g

Re: A interesting question (search by number of terms)

2010-01-21 Thread Benjamin Heilbrunn

Try BooleanQuery.setMinimumNumberShouldMatch 2010/1/21 Phan The Dai : > Hi everyone, I need you support with this question: > Assuming that I have some terms, such as: ("A", "B", "C", "D", "E") > How to search documents that contain a nu

A interesting question (search by number of terms)

2010-01-21 Thread Phan The Dai

Hi everyone, I need you support with this question: Assuming that I have some terms, such as: ("A", "B", "C", "D", "E") How to search documents that contain a number of terms in that list but do not care what terms are. For example, any docume

Re: Is there any difference in a document between one added field with a number of terms and a field added a number of times ?

2010-01-13 Thread Paul Taylor

So not much help here, (I wonder if its because I posted 3 questions in one day) but Ive made some progress in my understaning. I understand there is only one norm per field and I think Lucene does no differentiating between adding the same field a number of times and adding mutiple text to th

Re: Is there any difference in a document between one added field with a number of terms and a field added a number of times ?

2010-01-12 Thread Paul Taylor

Thanks Felipe, but you are missing the point Artist really doesnt come into it, my problem is confined to the alias field, forget about artist its just detailed to give the complete scenario Paul Felipe wrote: You could change the boost of the field artist to be bigger than the field alias.

Re: Is there any difference in a document between one added field with a number of terms and a field added a number of times ?

2010-01-12 Thread Felipe

You could change the boost of the field artist to be bigger than the field alias. field.setBoost(artistBoost); 2010/1/12 Paul Taylor > Been doing some analysis with Luke (BTW doesnt work with StandardAnalyzer > since Version field introduced) and discovered a problem with field lenghth > bo

Is there any difference in a document between one added field with a number of terms and a field added a number of times ?

2010-01-12 Thread Paul Taylor

Been doing some analysis with Luke (BTW doesnt work with StandardAnalyzer since Version field introduced) and discovered a problem with field lenghth boosting for me. I have a document that represents a recording artist (i.e Madonna, The Beatles ectera) it contains an artist and an alias field

Re: Scoring formula - Average number of terms in IDF

2009-12-18 Thread Michael McCandless

do something approximate outside of Lucene? EG, make >>>> a TokenFilter that counts how many tokens are produced for each >>>> field/doc, aggregate & store that yourself, and use it in your >>>> similarity impl? >>>> >>>> Mike >>&

Re: Scoring formula - Average number of terms in IDF

2009-12-18 Thread kdev

ust >>> brainstorming type discussions now. >>> >>> You could always do something approximate outside of Lucene? EG, make >>> a TokenFilter that counts how many tokens are produced for each >>> field/doc, aggregate & store that yourself, and use it in

Re: Scoring formula - Average number of terms in IDF

2009-12-17 Thread Michael McCandless

kenFilter that counts how many tokens are produced for each >> field/doc, aggregate & store that yourself, and use it in your >> similarity impl? >> >> Mike >> >> On Tue, Dec 15, 2009 at 5:04 AM, kdev wrote: >>> >>> any ideas please? >>> --

Re: Scoring formula - Average number of terms in IDF

2009-12-17 Thread kdev

ty impl? > > Mike > > On Tue, Dec 15, 2009 at 5:04 AM, kdev wrote: >> >> any ideas please? >> -- >> View this message in context: >> http://old.nabble.com/Scoring-formula---Average-number-of-terms-in-IDF

Re: Scoring formula - Average number of terms in IDF

2009-12-17 Thread Michael McCandless

how many tokens are produced for each field/doc, aggregate & store that yourself, and use it in your similarity impl? Mike On Tue, Dec 15, 2009 at 5:04 AM, kdev wrote: > > any ideas please? > -- > View this message in context: > http://old.nabble.com/Scoring-formula---Average

Re: Scoring formula - Average number of terms in IDF

2009-12-15 Thread kdev

any ideas please? -- View this message in context: http://old.nabble.com/Scoring-formula---Average-number-of-terms-in-IDF-tp26282578p26792364.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To

Scoring formula - Average number of terms in IDF

2009-11-10 Thread kdev

Hi, I want to change the default scoring formula of lucene and one of the changes I want to perform is on the idf term. What I want to do is to include the average number of terms of the documents indexed in the collection in the idf method of the Similarity class. In order to change the

Re: Retrieve number of terms

2008-01-10 Thread Luis Rodrigo

Hi Chris, by "number of terms", do you mean the number of different terms that compose the index, or the numers of total terms, including repetitions? chris.b escribió: I'm sure this has been asked a few times before, but i searched and searched and found no answer (apart

Retrieve number of terms

2008-01-10 Thread chris.b

I'm sure this has been asked a few times before, but i searched and searched and found no answer (apart from using luke), but I would like to know if there's a way of retrieving the number of terms in an index. I tried cycling through a TermEnum, but i doesn't do anything :| -- Vi

Re: Number of terms

2007-10-16 Thread sandeep chawla

Thanks a lot but one question- IndexOutput class doesn't have a method writeFloat ? How do u write float to index.. shall i create public method writeFloat as public void writeFloat(float f) { writeByte((byte)(f >>32); writeByte((byte)(f >>16); writeByte((byte)(f >>8); writeB

Re: Number of terms

2007-10-16 Thread Karl Wettin

16 okt 2007 kl. 13.07 skrev sandeep chawla: While calculating the lengthnorm- there is a precision-loss. http://lucene.apache.org/java/docs/scoring.html#Score%20Boosting How to avoid the precision loss? You replace the use of bytes to floats when storing the norms (DocumentsWriter) in the f

Number of terms

2007-10-16 Thread sandeep chawla

Hi, While calculating the lengthnorm- there is a precision-loss. http://lucene.apache.org/java/docs/scoring.html#Score%20Boosting How to avoid the precision loss? Thanks Sandeep -- SANDEEP CHAWLA House No- 23 10th main BTM 1st Stage Bangalore Mobile: 91-9986150603

how to get the number of terms in an index

2006-06-03 Thread Roxana Angheluta

Hello, Is it possible to quickly get the total number of terms from all documents in an Lucene index for a given field? For example IndexReader has a method "int numDocs()", I would need a similar method "int numTerms(String field)". It looks a bit silly to use IndexReader.t

Re: Scoring by number of terms in field

2006-01-10 Thread Eric Jain

Paul Elschot wrote: In case you prefer to use the maximum score over the clauses you can use the DisjunctionMaxQuery from the development version. Yes, that may help! I'll need to have a look... - To unsubscribe, e-mail: [EMAI

Re: Scoring by number of terms in field

2006-01-10 Thread Paul Elschot

On Tuesday 10 January 2006 07:32, Eric Jain wrote: > Paul Elschot wrote: > >>For example, a query for "europe" should rank: > >> > >>1. title:"Europe" > >>2. title:"History of Europe" > >>3. title:"Travel in Europe, Middle East and Africa" > >>4. subtitle:"Fairy Tales from Europe" > > > > Perhaps

AW: Scoring by number of terms in field

2006-01-10 Thread Stefan Gusenbauer

e.org Betreff: Re: Scoring by number of terms in field Paul Elschot wrote: >>For example, a query for "europe" should rank: >> >>1. title:"Europe" >>2. title:"History of Europe" >>3. title:"Travel in Europe, Middle East and Africa

Re: Scoring by number of terms in field

2006-01-09 Thread Eric Jain

Paul Elschot wrote: For example, a query for "europe" should rank: 1. title:"Europe" 2. title:"History of Europe" 3. title:"Travel in Europe, Middle East and Africa" 4. subtitle:"Fairy Tales from Europe" Perhaps with this query (assuming the default implicit OR): title:europe subtitle:europe^

Re: Scoring by number of terms in field

2006-01-09 Thread Erik Hatcher

Sorry for the quick reply, but yes you can accomplish this by tweaking a custom Similarity implementation (or DefaultSimilarity subclass). Check out IndexSearcher.explain on a query and a document and then tinker. Erik On Jan 9, 2006, at 4:34 AM, Eric Jain wrote: Lucene seems to

Re: Scoring by number of terms in field

2006-01-09 Thread Paul Elschot

On Monday 09 January 2006 10:34, Eric Jain wrote: > Lucene seems to prefer matches in shorter documents. Is it possible to > influence the scoring mechanism to have matches in shorter fields score > higher instead? A query is always in at least one field of a document. > > For example, a query

Scoring by number of terms in field

2006-01-09 Thread Eric Jain

Lucene seems to prefer matches in shorter documents. Is it possible to influence the scoring mechanism to have matches in shorter fields score higher instead? For example, a query for "europe" should rank: 1. title:"Europe" 2. title:"History of Europe" 3. title:"Travel in Europe, Middle East a

Re: getting number of terms in a document/field

Re: getting number of terms in a document/field

Re: getting number of terms in a document/field

Re: getting number of terms in a document/field

getting number of terms in a document/field

Re: A interesting question (search by number of terms)

Re: A interesting question (search by number of terms)

A interesting question (search by number of terms)

Re: Is there any difference in a document between one added field with a number of terms and a field added a number of times ?

Re: Is there any difference in a document between one added field with a number of terms and a field added a number of times ?

Re: Is there any difference in a document between one added field with a number of terms and a field added a number of times ?

Is there any difference in a document between one added field with a number of terms and a field added a number of times ?

Re: Scoring formula - Average number of terms in IDF

Re: Scoring formula - Average number of terms in IDF

Re: Scoring formula - Average number of terms in IDF

Re: Scoring formula - Average number of terms in IDF

Re: Scoring formula - Average number of terms in IDF

Re: Scoring formula - Average number of terms in IDF

Scoring formula - Average number of terms in IDF

Re: Retrieve number of terms

Retrieve number of terms

Re: Number of terms

Re: Number of terms

Number of terms

how to get the number of terms in an index

Re: Scoring by number of terms in field

Re: Scoring by number of terms in field

AW: Scoring by number of terms in field

Re: Scoring by number of terms in field

Re: Scoring by number of terms in field

Re: Scoring by number of terms in field

Scoring by number of terms in field

32 matches

Site Navigation

Mail list logo

Footer information