Hi Mike and all other friends, the code below does almost what I want. The only thing I need to do is to write the <docID, score> pairs in the order with respect to score, not docID. Current implementation does it wrt docID. I will try to solve that problem, if anybody knows how to do that I appreciate if he/she shares with me.
I am adding my code just in case anybody needs it. Regards. Sahin. public class YilmazTest { public static void main(String[] args) { try{ BufferedWriter out = new BufferedWriter(new FileWriter("/home/guardian/Lucene/output")); // output file Directory dir = FSDirectory.open(new File("/home/guardian/Lucene/Indexes")); IndexReader reader = IndexReader.open(dir); TermEnum termEnum = reader.terms(); IndexSearcher searcher; searcher = new IndexSearcher(dir, true); System.out.println(reader.numDocs()); while(termEnum.next()){ TermDocs termDocs = reader.termDocs(termEnum.term()); TermQuery tq = new TermQuery(new Term(termEnum.term().field(), termEnum.term().text())); Scorer s = tq.weight(searcher).scorer(reader, true, false); boolean once = true; while(termDocs.next()){ if (once){ out.write(termEnum.term().text() + " "); out.write(termEnum.docFreq() + " "); } s.nextDoc(); out.write("<" + termDocs.doc() + "," + s.score() + "> "); once = false; } out.newLine(); } out.close(); } catch(IOException ex){} } } On Sat, Oct 2, 2010 at 9:42 AM, Sahin Buyrukbilen < sahin.buyrukbi...@gmail.com> wrote: > Hi Mike, > > I am already done with walking through the terms, frequencies and the docs > by using termenum, termdocs, and indexreader,. The only thing left is the > scores. I will try your suggestion. hope it works. > > Thank you. > > Sahin. > > On Sat, Oct 2, 2010 at 5:30 AM, Michael McCandless < > luc...@mikemccandless.com> wrote: > >> It sounds like you can just use Lucene's enum APIs (IndexReader.terms, >> IndexReader.termDocs) to walk the entire index, converting it to your >> format? >> >> I'm not sure how Luke computes the "score"... but maybe you could, for >> every term, make a TermQuery and then directly walk its matching docs >> & scores? You'd have to do something like: >> >> Scorer s = TermQuery.weight(searcher).scorer(reader, true, false); >> >> int docID; >> while((docID = s.nextDoc()) != Scorer.NO_MORE_DOCS) { >> float score = s.score(); >> } >> >> I think? >> >> Mike >> >> On Fri, Oct 1, 2010 at 11:49 PM, Sahin Buyrukbilen >> <sahin.buyrukbi...@gmail.com> wrote: >> > Hi Erick, >> > >> > I mean the score of a term in a document (we can think this as a one >> word >> > query) which is calculated by using "Default Similarity". Actually, when >> I >> > walk through my index term-by-term, Luke shows me the number of >> documents in >> > which the term exists. And for each document there is a score field. >> please >> > check the attachment for the screenshot. I am very new to the jargon of >> > Lucene, so I am sorry if I explain things in an incorrect way. >> > >> > My question is: For a term in the index, can we retrieve the value (here >> I >> > say score) calculated by using default similarity? Is this a value which >> is >> > already stored in the index or is it calculated on the fly by Luke >> (since I >> > can only see by using Luke)? >> > >> > My goal is to create an inverted index and write it into a text file in >> the >> > following form: >> > >> > Term t ft Inverted list for t >> > >> ---------------------------------------------------------------------------------- >> > big 2 <2, 0.148> <3, 0.088> >> > in 5 <6, 0.159> <2, 0.143> <5, 0.088> <1, 0.076> >> <4, >> > 0.065> >> > - >> > - >> > - >> > - >> > - >> > so on for all terms. Here ft is the total frequency of term t in the >> whole >> > index, <docID , score > pairs are ID of the document in which term t has >> a >> > score, and these pairs are listed according to the decreasing order of >> > scores. >> > >> > >> > I checked through the documentation, and found scorer class but couldnt >> > understand how to use it. >> > >> > I hope this is a kind of better explanation. >> > >> > Best. >> > Sahin. >> > >> > >> > On Fri, Oct 1, 2010 at 9:22 PM, Erick Erickson <erickerick...@gmail.com >> > >> > wrote: >> >> >> >> I'm not sure what you're asking for. "Score of a term in a document"? >> Do >> >> you >> >> mean the amount a term contributed to a search for a particular >> document? >> >> The frequency of a term in a document? ??? >> >> >> >> Could you elaborate on what you're trying to do? If you describe the >> >> problem >> >> you're trying to solve, people can provide better answers. >> >> >> >> Best >> >> Erick >> >> >> >> On Fri, Oct 1, 2010 at 11:33 AM, Sahin Buyrukbilen < >> >> sahin.buyrukbi...@gmail.com> wrote: >> >> >> >> > Hi all, >> >> > >> >> > I need to retrieve the score of a term in a document? I dont want to >> >> > play >> >> > different scoring schemes. I just checked my index with Luke and it >> >> > shows >> >> > me >> >> > a score for each term in each document the term exists. So, I need >> just >> >> > to >> >> > get that score. >> >> > >> >> > Can anybody help me? >> >> > >> >> > Thank you in advance. >> >> > >> >> > Sahin. >> >> > >> > >> > >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> > For additional commands, e-mail: java-user-h...@lucene.apache.org >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >