Hi Mike, I am already done with walking through the terms, frequencies and the docs by using termenum, termdocs, and indexreader,. The only thing left is the scores. I will try your suggestion. hope it works.
Thank you. Sahin. On Sat, Oct 2, 2010 at 5:30 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > It sounds like you can just use Lucene's enum APIs (IndexReader.terms, > IndexReader.termDocs) to walk the entire index, converting it to your > format? > > I'm not sure how Luke computes the "score"... but maybe you could, for > every term, make a TermQuery and then directly walk its matching docs > & scores? You'd have to do something like: > > Scorer s = TermQuery.weight(searcher).scorer(reader, true, false); > > int docID; > while((docID = s.nextDoc()) != Scorer.NO_MORE_DOCS) { > float score = s.score(); > } > > I think? > > Mike > > On Fri, Oct 1, 2010 at 11:49 PM, Sahin Buyrukbilen > <sahin.buyrukbi...@gmail.com> wrote: > > Hi Erick, > > > > I mean the score of a term in a document (we can think this as a one word > > query) which is calculated by using "Default Similarity". Actually, when > I > > walk through my index term-by-term, Luke shows me the number of documents > in > > which the term exists. And for each document there is a score field. > please > > check the attachment for the screenshot. I am very new to the jargon of > > Lucene, so I am sorry if I explain things in an incorrect way. > > > > My question is: For a term in the index, can we retrieve the value (here > I > > say score) calculated by using default similarity? Is this a value which > is > > already stored in the index or is it calculated on the fly by Luke (since > I > > can only see by using Luke)? > > > > My goal is to create an inverted index and write it into a text file in > the > > following form: > > > > Term t ft Inverted list for t > > > ---------------------------------------------------------------------------------- > > big 2 <2, 0.148> <3, 0.088> > > in 5 <6, 0.159> <2, 0.143> <5, 0.088> <1, 0.076> > <4, > > 0.065> > > - > > - > > - > > - > > - > > so on for all terms. Here ft is the total frequency of term t in the > whole > > index, <docID , score > pairs are ID of the document in which term t has > a > > score, and these pairs are listed according to the decreasing order of > > scores. > > > > > > I checked through the documentation, and found scorer class but couldnt > > understand how to use it. > > > > I hope this is a kind of better explanation. > > > > Best. > > Sahin. > > > > > > On Fri, Oct 1, 2010 at 9:22 PM, Erick Erickson <erickerick...@gmail.com> > > wrote: > >> > >> I'm not sure what you're asking for. "Score of a term in a document"? Do > >> you > >> mean the amount a term contributed to a search for a particular > document? > >> The frequency of a term in a document? ??? > >> > >> Could you elaborate on what you're trying to do? If you describe the > >> problem > >> you're trying to solve, people can provide better answers. > >> > >> Best > >> Erick > >> > >> On Fri, Oct 1, 2010 at 11:33 AM, Sahin Buyrukbilen < > >> sahin.buyrukbi...@gmail.com> wrote: > >> > >> > Hi all, > >> > > >> > I need to retrieve the score of a term in a document? I dont want to > >> > play > >> > different scoring schemes. I just checked my index with Luke and it > >> > shows > >> > me > >> > a score for each term in each document the term exists. So, I need > just > >> > to > >> > get that score. > >> > > >> > Can anybody help me? > >> > > >> > Thank you in advance. > >> > > >> > Sahin. > >> > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >