Query regarding usage of Lucene(Filtering folder)

2008-02-27 Thread Ravinder.Teepiredddy
Hi All, I had a query regarding usage of lucene. I have done the indexing for the files kept in root folder -> subfolder-> Subfolder structure. When I make the search with particular word it returns me the list of matching files across the folder structure right from root to the last subfolde

Query regarding usage of Lucene - Filtering folders

2008-02-27 Thread Mohammad.Ahmed
Hi I would like to join java-user mailing list. I had a query regarding usage of lucene. I have done the indexing for the files kept in root folder -> subfolder -> subfolder structure. When I make the search with particular word it returns me the list of matching files across the f

Re: Lucene Search Performance

2008-02-27 Thread Jamie
Hi Thanks for the suggestions. This would require us to change the index and right now we literally have millions of documents stored in current index format. I'll bear it in mind, but I am not entirely sure how I would go about implementing the change at this point. Much appreciate Jamie

Re: Lucene Search Performance

2008-02-27 Thread h t
1. redefine the archivedate field as YYmmDD format, 2. add another field using timestamp for sort use. 3. use RangeFilter to get result and then sort by timestamp. 2008/2/27, Jamie <[EMAIL PROTECTED]>: > > Hi Michael & Others > > Ok. I've gathered some more statistics from a different machine for

RE: How do i get a text summary

2008-02-27 Thread Ravinder.Teepiredddy
Hi John, I am getting Summary value null in results.jsp page and I need "snippet" or "fragment" to be highlighted. I have gone through lucene faqs related but it's not clear. I will appreciate if you help me to find list of files (Java) to be modified. Thanks in advance. Ravinder -Original

RE: How do i get a text summary

2008-02-27 Thread John Griffin
Ravinder, If you want something from an index it has to be IN the index. So, store a summary field in each document and make sure that field is part of the query. John G. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Wednesday, February 27, 2008 7:58 PM To:

RE: Document.setBoost() doesn't work

2008-02-27 Thread John Griffin
Soren, Your documents are being boosted. Because of the way document boost values immediately go through some calculations and are stored in the index Luke will always show 1.o as the boost value. There has been some talk in the recent past that this should be removed from Luke since it is actuall

Re: How do i get a text summary

2008-02-27 Thread Seth Call
Am I missing something? Isn't this exactly what Lucene does? Put in a value when you create your Document, get it back out when it comes back from a search, right? Want a text summary? Put it in to the document... I just started playing with Lucene so maybe I'm missing something, but these ques

How do i get a text summary

2008-02-27 Thread Ravinder.Teepiredddy
Hi All, Is there a way to get a text summary of an indexed document to display along with the search result? Please let me know the technical changes. Thanks, Ravinder DISCLAIMER: This message contains privileged and confidential information and is intended only for an individual n

Re: Inconsistent Search Speed

2008-02-27 Thread Grant Ingersoll
Ah, you didn't mention term vectors. What do you need them for? Perhaps a bit more background could help here. -Grant On Feb 27, 2008, at 1:31 PM, fangz wrote: I implemented HitCollector as you suggested. It improved the initial run significantly. However it only showed slight improveme

filter issue

2008-02-27 Thread Rong Shen
Hi List, I have a situation similar to indexing a mailing list, with each mail indexed as a Doc. Mails from a same thread share a same thread ID, which is indexed in a separate field. Now I want to search through all the mails using some keywords, and list all the unique thread IDs which I can pas

Document.setBoost() doesn't work

2008-02-27 Thread Soeren Pekrul
I work with Lucene 2.0. I boost some documents: Document doc = new Document(); // adding fields doc.setBoost(2.0f); indexwriter.addDocument(doc); If I look to my index with Luke (0.6) the boost value of all documents is still 1.0. How can I boost documents? Thanks. Sören

Re: Atomicity and AutoCommit

2008-02-27 Thread Michael McCandless
Simon Wistow wrote: On Wed, Feb 27, 2008 at 09:38:55AM -0500, Michael McCandless said: When you previously saw corruption was it due to an OS or machine crash (or power cord got pulled)? If so, you were likely hitting LUCENE-1044, which is fixed on the trunk version of Lucene (to be 2.4 at s

Re: Atomicity and AutoCommit

2008-02-27 Thread Mark Miller
You need to make sure your storage does not lie in response to an fsync command. If it does (most commercial stuff does), you cannot guaranty no corruption. Search google for "your harddrive lies to you" or something. It shouldnt be that hard to take the patch from the issue and apply it to a

Re: Atomicity and AutoCommit

2008-02-27 Thread Simon Wistow
On Wed, Feb 27, 2008 at 09:38:55AM -0500, Michael McCandless said: > > When you previously saw corruption was it due to an OS or machine > crash (or power cord got pulled)? If so, you were likely hitting > LUCENE-1044, which is fixed on the trunk version of Lucene (to be 2.4 > at some point) but

Re: Inconsistent Search Speed

2008-02-27 Thread fangz
I implemented HitCollector as you suggested. It improved the initial run significantly. However it only showed slight improvement in the subsequent runs. I don't know how to implement FieldSelector in my situation. My codes look like this: public void collect( int doc, float score ) { TermFr

explain() - fieldnorm

2008-02-27 Thread JensBurkhardt
Hey everybody, As my subject is telling, i have a little problem with analyzing the explain() output. I know, that the fieldnorm value consists out of "documentboost, fieldboost and lengthNorm". Is is possible to recieve the single values? I know that they are multiplied while indexing but can t

Re: lucene java OOM while sorting more than one field

2008-02-27 Thread Erick Erickson
The first question is always "how much memory are you giving your JVM?". A 256M index is pretty small, I wouldn't be surprised if your JVM is using some vary small default Best Erick On Wed, Feb 27, 2008 at 6:23 AM, GURUPRASAD MS <[EMAIL PROTECTED]> wrote: > Lucene Index contains 2.1 Milli

Re: Inconsistent Search Speed

2008-02-27 Thread Erick Erickson
To reinforce Grant's comment, lazy loading improved one situation for me on the order of 10X. I wrote it up and it's somewhere in the Wiki. Your results will vary, and unless you have a LOT of stored fields I wouldn't necessarily expect a similar speedup, but it's sure worth looking at. And don't

Re: Atomicity and AutoCommit

2008-02-27 Thread Michael McCandless
When you previously saw corruption was it due to an OS or machine crash (or power cord got pulled)? If so, you were likely hitting LUCENE-1044, which is fixed on the trunk version of Lucene (to be 2.4 at some point) but is not fixed in 2.3. If that is what you were hitting, then unfortunately n

lucene java OOM while sorting more than one field

2008-02-27 Thread GURUPRASAD MS
Lucene Index contains 2.1 Million records (indexed from 2.1 million records from sqlserver DB). Lucene Index file Size 256MB Lucene version: 2.3 Searching works fine when we sort the results on a single field. However, if the search results is sorted on more than one field we get Out of Memory exc

Atomicity and AutoCommit

2008-02-27 Thread Simon Wistow
I currently have a set up that indexes into RAM and then periodically merges that into a disk based index. Searches are done from the disk based index and deletes are handled by keeping a list of deleted documents, filtering out search results and applying the deletes to the index at merge tim

Re: Lucene Search Performance

2008-02-27 Thread Michael Prichard
I'm wondering if your date field's precision may be a little too much? What I mean is that you are going all the way down to seconds. Whenever you do a range query you are essentially spawning a BooleanQuery with a representation of that range. Do you really need to be that precise? I u

Re: Inconsistent Search Speed

2008-02-27 Thread Grant Ingersoll
You could also look at the FieldSelector when getting the Document. Such that you only load the one field you need -Grant On Feb 26, 2008, at 10:13 PM, Mark Miller wrote: The Lucene prime directive: dont iterate through all of Hits! Its horribly inefficient. You must use a hitcollector. Ev

Re: Security filtering from external DB

2008-02-27 Thread Gabriel Landais
h t a écrit : I guess you can implement createBitSet() more effciently by using Filer,but not BooleanQuery Hi, thanks for advice, but did you mean Filter or Filer? And even if I should use a Filter, I don't really understand how to replace the Boolean query :( The boolean query is already very