Re: Which is faster/better

2008-12-01 Thread Ganesh
So in your UI, you'd like the delete to happen immediately and then it's OK if the updated (added) document then takes a minute to appear? Yes. Whenever a document state is changed, it moves to different store (basically a Mail applicaiton, each mail has state of deleted, junk, delivered etc)

Re: serialVersionUID issue between 2.3 and 2.4

2008-12-01 Thread Karl Wettin
You could get the 2.4 code and set the serialVersionUID of the Term class to the UID assigned to the 2.3 Term class (554776219862331599l) and recompile. As for statically setting a serialVersionUID in the class, one could instead set it to a final value and implement Externalizable in order

Re: serialVersionUID issue between 2.3 and 2.4

2008-12-01 Thread Jason Rutherglen
Hi Mike, Can you build and release a 2.4 jar using the 2.3 build environment? > Besides having to remember to change the serialVersionUID, are there any known downsides to setting it explicitly? As far as I know it's all good. Jason On Mon, Dec 1, 2008 at 6:11 PM, Michael McCandless < [EMAIL P

Re: serialVersionUID issue between 2.3 and 2.4

2008-12-01 Thread Michael McCandless
Jason Rutherglen wrote: if you don't set serialVersionUID yourself, then java assigns a rather volatile one for you True however the Java specification defines how the serialVersionUID should be created in the event it's not defined. The caveat being it's not strictly enforced and so Sun

Re: serialVersionUID issue between 2.3 and 2.4

2008-12-01 Thread Jason Rutherglen
> if you don't set serialVersionUID yourself, then java assigns a rather volatile one for you True however the Java specification defines how the serialVersionUID should be created in the event it's not defined. The caveat being it's not strictly enforced and so Sun alternative compilers may deci

Re: serialVersionUID issue between 2.3 and 2.4

2008-12-01 Thread Michael McCandless
Well.. if you don't set serialVersionUID yourself, then java assigns a rather volatile one for you so that it doesn't attempt to deserialize to an incompatible local class. We could assign one ourselves, and then we have to remember to change it if we ever make a big enough change to Term, to al

Re: Which is faster/better

2008-12-01 Thread Jason Rutherglen
It would be nice to have a pluggable solution for deleteddocs in IndexReader that accepts a Filter, and have BitVector implement Filter. This way I do not have to implement IndexReader.clone. On Mon, Dec 1, 2008 at 5:04 PM, Michael McCandless < [EMAIL PROTECTED]> wrote: > > So in your UI, you'd

Re: Which is faster/better

2008-12-01 Thread Michael McCandless
So in your UI, you'd like the delete to happen immediately and then it's OK if the updated (added) document then takes a minute to appear? OK, I agree this (the immediacy of doing deletes via IndexReader) is a good reason to keep IndexReader.deleteDocument for now. Mike Ganesh wrote: I

serialVersionUID issue between 2.3 and 2.4

2008-12-01 Thread Jason Rutherglen
Seeing the following issue between Lucene 2.3 and 2.4. A 2.3 serialized Term object cannot be deserialized by 2.4. I would guess it has something to do with a different Java compiler being used for the Lucene 2.4 build as serialVersionUID is not defined in the Term class. Fixing the issue is crit

Re: Pdf in Lucene?

2008-12-01 Thread Grant Ingersoll
I certainly don't either, since you haven't said what the actual exception is. If I had to guess, though, I would say it is the line Document document = LucenePDFDocument.getDocument And that the Lucene library expected by PDFBox is not the same version of Lucene you are using. I would sug

Re: Pdf in Lucene?

2008-12-01 Thread Steven D. Majewski
On Dec 1, 2008, at 8:22 AM, Grant Ingersoll wrote: On Dec 1, 2008, at 8:01 AM, tiziano bernardi wrote: I tried to use pdfbox but gives me an error. That the version of lucene and the pdfbox are incompatible. Lucene knows nothing about PDFBox, so I don't see how they could be incompatibl

Re: Boosting fields are searching or indexing time?

2008-12-01 Thread Grant Ingersoll
Possibly, but probably not. Index time boosting is generally done to say one field is more important than another field, or one document is more important than another document, whereas query time boosting generally says this term is more important than that term. Additionally, search time

RE: Pdf in Lucene?

2008-12-01 Thread tiziano bernardi
this is my class, I use eclipse and I haven't any errors.Do not understand where the problem import java.io.File; import java.io.IOException; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Docum

Re: Newbie: MatchAllDocsQuery sample?

2008-12-01 Thread Ian Vink
The Lucene.NET implementation doesn't have: TopDocs search(Query query, int n) it only has: public virtual TopDocs Search( Query

Re: ID field - hundreds?

2008-12-01 Thread Erick Erickson
How are you measuring speed? See the FAQ about query speed... http://wiki.apache.org/jakarta-lucene/LuceneFAQ Also be aware that the first few queries have very significant overhead. You've really got to provide more detail to get meaningful help. How big is your index? How do you measure respon

Re: Newbie: MatchAllDocsQuery sample?

2008-12-01 Thread Erick Erickson
The usual reason for this is that you need to look at your analyzers. What analyzers are you using during index AND query time? The first thing I suspect here is capitalization. Are you sure that both your index and query analyzers fold case? Especially if you're new to Lucene, get a copy of Luke.

Re: Newbie: MatchAllDocsQuery sample?

2008-12-01 Thread Ian Lea
How are you searching? Are you telling it to collect enough hits? e.g. if you are using method TopDocs search(Query query, int n) are you setting n high enough? -- Ian. On Mon, Dec 1, 2008 at 1:48 PM, Ian Vink <[EMAIL PROTECTED]> wrote: > But when I search (50,000 documents) I don't get all doc

ID field - hundreds?

2008-12-01 Thread Ian Vink
Each document has a field "DocID" with a unique int in the index. I want to search the documents with DocID of 1 or 2 or 5 or 8 etc (hundreds long list) When I specify a query like: [ contents:Hello DocID:1 DocID:2 ] etc it is slow. Is there a more efficient way to limit my search to books in

Re: Newbie: MatchAllDocsQuery sample?

2008-12-01 Thread Ian Vink
But when I search (50,000 documents) I don't get all documents with "Hello" in them. I get a lot, but not all. Ian On Mon, Dec 1, 2008 at 9:33 AM, Erik Hatcher <[EMAIL PROTECTED]>wrote: > > On Dec 1, 2008, at 8:30 AM, Ian Vink wrote: > >> Is there a simple example on how to query for "contents:He

Re: Newbie: MatchAllDocsQuery sample?

2008-12-01 Thread Erik Hatcher
On Dec 1, 2008, at 8:30 AM, Ian Vink wrote: Is there a simple example on how to query for "contents:Hello" in all documents using MatchAllDocsQuery ? I want 100% of the docs with "Hello" You're looking

Newbie: MatchAllDocsQuery sample?

2008-12-01 Thread Ian Vink
Is there a simple example on how to query for "contents:Hello" in all documents using MatchAllDocsQuery ? I want 100% of the docs with "Hello" Ian

Re: Pdf in Lucene?

2008-12-01 Thread Grant Ingersoll
On Dec 1, 2008, at 8:01 AM, tiziano bernardi wrote: I tried to use pdfbox but gives me an error. That the version of lucene and the pdfbox are incompatible. Lucene knows nothing about PDFBox, so I don't see how they could be incompatible, unless your are referring to PDFBox's Lucene Docume

Re: Query time document group boosting

2008-12-01 Thread Toke Eskildsen
On Thu, 2008-11-27 at 20:55 +0100, Karl Wettin wrote: > A cosmetic remark, I would personally choose a single field for the > boosts and then one token per source. (groupboost:A^10 groupboost:B^1 > groupboost:C^0.1). Agreed. Thanks. > If I'm not misstaken CustomScoreQuery is a non matching qu

RE: Pdf in Lucene?

2008-12-01 Thread tiziano bernardi
I tried to use pdfbox but gives me an error. That the version of lucene and the pdfbox are incompatible. I use pdf box 0.7.3 and lucene 2.1.0> Date: Mon, 1 Dec 2008 11:43:00 +> From: [EMAIL PROTECTED]> To: java-user@lucene.apache.org> Subject: Re: Pdf in Lucene?> > Hi> > > Lucene only indexe

Re: Hits Max # of documents?

2008-12-01 Thread Grant Ingersoll
I'm not sure about .NET, but there should be a version of the search() method that returns a TopDocs instance. Looks like: TopDocs td = searcher.search(query, int) where the int is the number of results you want back, in your case, Integer.MAX_VALUE -Grant On Dec 1, 2008, at 6:38 AM, Ian

Re: Hits Max # of documents?

2008-12-01 Thread Ian Lea
>From the (java) apidocs: Class MatchAllDocsQuery ... A query that matches all documents. Sounds like it should do the trick. -- Ian. On Mon, Dec 1, 2008 at 11:38 AM, Ian Vink <[EMAIL PROTECTED]> wrote: > (I'm using Lucene.NET but the APIs are close enough) > I'd like the search to always retu

Re: Pdf in Lucene?

2008-12-01 Thread Ian Lea
Hi Lucene only indexes text so you'll have to get the text out of the PDF and feed it to lucene. Google for lucene pdf, or go straight to http://www.pdfbox.org/ -- Ian. 2008/12/1 tiziano bernardi <[EMAIL PROTECTED]>: > > > Hi, > I want to index PDF files with lucene is possible? > What like

Hits Max # of documents?

2008-12-01 Thread Ian Vink
(I'm using Lucene.NET but the APIs are close enough) I'd like the search to always return all documents always. I notice that it 'seems' to return a percentage of them. Hits myHits = searcher.search(query); Is what I use. Is there a way to force the searcher to give me everything? Ian

Pdf in Lucene?

2008-12-01 Thread tiziano bernardi
Hi,I want to index PDF files with lucene is possible? What like?Thanks Tiziano Bernardi _ 50 nuovi schemi per giocare su CrossWire! Accetta la sfida! http://livesearch.games.msn.com/crosswire/play_it/

Pdf in Lucene?

2008-12-01 Thread tiziano bernardi
Hi, I want to index PDF files with lucene is possible? What like? Thanks Tiziano Bernardi _ Fanne di tutti i colori, personalizza la tua Hotmail! http://imagine-windowslive.com/Hotmail/#0

API changes in 2.4

2008-12-01 Thread Michael McCandless
Heads up: there are two API changes in 2.4 that might bite you on upgrading: * If you are subclassing QueryParser and override addClause or getBooleanQuery, you need to change the argument type from Vector to List, else your method won't be called. This was caused by LUCENE-1369,

Re: Deleting from Index by URL field: is it safe?

2008-12-01 Thread Niels Ott
Hi all, German Kondolf schrieb: It works exactly as it does when you search of that term. Review in your index creation, if you store it without analyzing it (Index.UN_TOKENIZED), it will only match that document when you have an exact URL. Is that also true if I simply use the KeywordAnalyze

Re: Marked for deletion

2008-12-01 Thread Erik Hatcher
On Dec 1, 2008, at 3:28 AM, Ganesh wrote: I need to index voluminous data and i plan to shard it. The client may not know which shard db to query. Server will take care of complete shard management. I have done almost 50% of development with Lucene. In case of Solr, i think the client sh

Re: Marked for deletion

2008-12-01 Thread Ganesh
I need to index voluminous data and i plan to shard it. The client may not know which shard db to query. Server will take care of complete shard management. I have done almost 50% of development with Lucene. In case of Solr, i think the client should be aware of which core or instance it want