Re: IndexReader#docFreq(Term)

2007-08-30 Thread Michael Busch
Chris Hostetter wrote: > > unless i'm mistaken, docFreq isn't the only method affected by deleted > docs, things like termDocs, termPositions, terms, ... pretty much all of > hte IndexReader methods work that way (even getFieldNames could be > missleading if the only doc with a field of that name

Re: Re: Re: IndexReader#docFreq(Term)

2007-08-30 Thread tom
Tom Roberts is out of the office until 3rd September 2007 and will get back to you on his return. http://www.luxonline.org.uk http://www.lux.org.uk - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail:

Re: Re: IndexReader#docFreq(Term)

2007-08-30 Thread tom
Tom Roberts is out of the office until 3rd September 2007 and will get back to you on his return. http://www.luxonline.org.uk http://www.lux.org.uk - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail:

Re: IndexReader#docFreq(Term)

2007-08-30 Thread Chris Hostetter
- /** Returns the number of documents containing the term t. + /** Returns the number of documents, including deleted, containing the term t. there is a note about this in the javadocs for deleteDocument, but i agree it's not entirely clear ... unless i'm mistaken, docFreq isn't the only

IndexReader#docFreq(Term)

2007-08-30 Thread Karl Wettin
I was running in to some problems that turned out to be a non- documented feature. Here is a javadoc suggestion: - /** Returns the number of documents containing the term tcode>. + /** Returns the number of documents, including deleted, containing the term t. * @throws IOException if the

Re: BoostingTermQuery.explain() bugs

2007-08-30 Thread Chris Hostetter
Yes. JUnit would also be good, if you have it. if you want to write some there is a lot of good helper code already out there for making sure the hits and scores produced by a query match the explanations produced by that same query... [EMAIL PROTECTED]:~/svn/lucene-clean$ ls src/test/or

Re: Reduce copy error

2007-08-30 Thread Chris Hostetter
you should probably send this question to the nutch user mailing (or perhaps hte hadoop user mailing list) ... this is the mailing list for the Lucene java API that is used by nutch ... nothing in your stack trace seems to indicate any problems in any Lucene Java code. When i run nutch, i a

Re: How to speed-up index opening

2007-08-30 Thread Michael Busch
Antoine Baudoux wrote: > > > That's some good news! > > Any idea on the release date for 2.3? We're aiming for a release in early October. Keep your fingers crossed ;) - Michael - To unsubscribe, e-mail: [EMAIL PROTECTED] For

Re: unable to search from a string containing numbers seperated by comma.

2007-08-30 Thread Grant Ingersoll
Give these tips a try to see if they help: http://wiki.apache.org/lucene-java/ LuceneFAQ#head-3558e5121806fb4fce80fc022d889484a9248b71 Luke is your friend. Cheers, Grant On Aug 30, 2007, at 6:06 AM, prabin meitei wrote: Hi, I am trying to search from an idlist (string containing comma s

Re: BoostingTermQuery.explain() bugs

2007-08-30 Thread Grant Ingersoll
On Aug 30, 2007, at 3:40 PM, Peter Keegan wrote: There are a couple of minor bugs in BoostingTermQuery.explain(). 1. The computation of average payload score produces NaN if no payloads were found. It should probably be: float avgPayloadScore = super.score() * (payloadsSeen > 0 ? (payload

Re: Lockless read-only deletions in IndexReader?

2007-08-30 Thread Karl Wettin
20 aug 2007 kl. 14.33 skrev Michael McCandless: "karl wettin" <[EMAIL PROTECTED]> wrote: I want to set documents in my IndexReader as deleted, but I will never commit these deletions. Sort of a filter on a reader rather than on a searcher, and no write-locks. I could go hacking in IndexR

BoostingTermQuery.explain() bugs

2007-08-30 Thread Peter Keegan
There are a couple of minor bugs in BoostingTermQuery.explain(). 1. The computation of average payload score produces NaN if no payloads were found. It should probably be: float avgPayloadScore = super.score() * (payloadsSeen > 0 ? (payloadScore / payloadsSeen) : 1); 2. If the average payload sco

SearchBlox Version 4.1 with Search Result Clustering released

2007-08-30 Thread Robert Selvaraj
SearchBlox uses the Lucene Search API and delivers out-of-the-box search functionality for rapid deployment and easy administration. SearchBlox provides integrated HTTP/HTTPS, File System and Feed crawlers, support for various document formats including HTML, Word, PDF, PowerPoint and Excel, suppo

Re: Scoring results?!

2007-08-30 Thread Peter Keegan
If I use BoostingTermQuery on a query containing terms without payloads, I get very different results than doing the same query with TermQuery. Presumably, this is because the BoostingSpanScorer/SpanScorer compute scores differently than TermScorer. Is there a way to make BoostingTermQuery behave l

Lucene indexing for pdf files

2007-08-30 Thread Madhu
Hi all... i am indexing pdf document using pdfbox 7.4, its working fine for some pdf files. for japanese pdf files its giving the below exception. caught a class java.io.IOException with message: Unknown encoding for 'UniJIS-UCS2-H' Can any one help me , how to set the encoding while reading pd

Re: Lucene indexing

2007-08-30 Thread Karl Wettin
30 aug 2007 kl. 11.24 skrev Madhu: Hi all.. I am trying to index 5Mb excel file ,but while indexing using poi 3..Its giving me out of memory exception. Can any one knows how to index large size excle files files. Increase the maximum VM heap size? http://blogs.sun.com/watt/resource/jvm-

unable to search from a string containing numbers seperated by comma.

2007-08-30 Thread prabin meitei
Hi, I am trying to search from an idlist (string containing comma seperated numeric values) eg: QueryParser vParser = new QueryParser("idlist", new AlphanumAnalyzer()); // analyzer using custom lettertokenizer which tokenize nuber also. class is given below. Query q = vParser.parse("55"); // exa

Re: Can a Lucene field be renamed in a Lucene index?

2007-08-30 Thread Erik Hatcher
On Aug 29, 2007, at 10:33 PM, George Aroush wrote: Just read the thread. Unfortunately, it doesn't offer a solution. As I read it offered a number of solutions: * Twiddle the *.fnm files (carefully) * Use string substitution on the users query, so "foo:whatever" -> "bar:whatever" unde

Lucene indexing

2007-08-30 Thread Madhu
Hi all.. I am trying to index 5Mb excel file ,but while indexing using poi 3..Its giving me out of memory exception. Can any one knows how to index large size excle files files. - To unsubscribe, e-mail: [EMAIL PROTECTED] For

Re: How to speed-up index opening

2007-08-30 Thread Antoine Baudoux
Le 30 Aug 2007 à 05:38, Michael Busch a écrit : Chris Lu wrote: Hi, Antoine, It does take a long time to open the index reader. One thing you could do is to put new documents into one smaller index and re-open it, it should be much faster. We're planning to add a reopen() method to Ind

Re: Re: Re: How to speed-up index opening

2007-08-30 Thread tom
Tom Roberts is out of the office until 3rd September 2007 and will get back to you on his return. http://www.luxonline.org.uk http://www.lux.org.uk - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail:

Re: Re: How to speed-up index opening

2007-08-30 Thread tom
Tom Roberts is out of the office until 3rd September 2007 and will get back to you on his return. http://www.luxonline.org.uk http://www.lux.org.uk - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail:

Re: How to speed-up index opening

2007-08-30 Thread Antoine Baudoux
Le 29 Aug 2007 à 23:33, Chris Lu a écrit : Hi, Antoine, It does take a long time to open the index reader. One thing you could do is to put new documents into one smaller index and re-open it, it should be much faster. Yes, but there is the problem of deleted /updated documents. Your so