Hi Zeynep, I was facing the same issue in CarmelUniformTermPruningPolicy in package org.apache.lucene.index.pruning .
I think the issue is in the while loop condition in following peice of code *while ((docsPos < (docs.length - 1))* * && termPositions.doc() > docs[docsPos].doc) {* * docsPos++;* * }* * if (termPositions.doc() == docs[docsPos].doc) {* * // pass* * docsPos++; // move to next doc id* * return false;* * } else if (termPositions.doc() < docs[docsPos].doc) {* * return true; // skip this one - it's less important* * }* *// should not happen!* *throw new IOException("termPositions.doc > docs[docsPos].doc");* in the while loop , docPos will keep getting incremented until the condition fails which can happen in two cases 1 If *docsPos < (docs.length - 1) or * * 2 If ** termPositions.doc() > docs[docsPos].doc* * * The error occurs when docsPos < docs.length-1 is false , but *termPositions.doc() > docs[docsPos].doc *is still satisfied* . * * * Due to this , the if() { } else if() { } block does not run and the exception is thrown. Fix - I added another condition which return true if(docsPos == docs.length-1) just above the step which throws the exception Im not sure if my fix is correct but it seems to be working . Will update if I am certain . Regards Jake On Mon, May 7, 2012 at 10:52 AM, Zeynep P. <zp...@yahoo.com> wrote: > Thanks for the link. I reviewed it. > Here are more details about the exception: > > I used contrib/benchmark/conf/wikipedia.alg to index wikipedia dump with > MAddDocs: 200000. I wanted to index only a specific period of time so I > added an if statement in doLogic of AddDocTask class. > I tried to prune the index by using pruning package (CarmelTopKPruning) and > I had the exception. > > I added System.out.println(term); as the first line of the > initPositionsTerm and System.out.println("***" + term); as the last line of > it. Carmel top k exception comes from pruneAllPositions (throw new > IOException("termPositions.doc > docs[docsPos].doc"); ). > > For example, for token body:freely I had the output as follows: > > body:freely > ***body:freely > body:freely > ***body:freely > body:freely > ***body:freely > Carmel topk in exception (docs[docsPos].doc = 4414, termPositions.doc() = > 4995) > Carmel topk in exception (docs[docsPos].doc = 4414, termPositions.doc() = > 4996) > Carmel topk in exception (docs[docsPos].doc = 4414, termPositions.doc() = > 4997) .. > Carmel topk in exception > Carmel topk in exception > Carmel topk in exception > Carmel topk in exception > Carmel topk in exception > Carmel topk in exception > Carmel topk in exception > Carmel topk in exception > Carmel topk in exception > body:freely > ***body:freely > Carmel topk in exception > Carmel topk in exception > body:freely > ***body:freely > body:freely > ***body:freely > > I hope that my problem is more clear now. > > Thanks in advance, > Best Regards > ZP > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/pruning-package-pruneAllPositions-tp3954762p3968723.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >