Re: Using lucene as a database... good idea or bad idea?

2008-07-28 Thread Hasan Diwan
Check the nutch or solr projects, both of which are subprojects of lucene. Feel free to drop me a line if you should run into difficulties. Sent via BlackBerry by AT&T -Original Message- From: "John Evans" <[EMAIL PROTECTED]> Date: Mon, 28 Jul 2008 18:53:08 To: Subject: Using lucene as

Re: Creating an index from an XML file using Lucene in Java

2008-07-28 Thread syedfa
Dear Karsten: Sorry for the multiple posts, but I have made some progress. I think in order to search multiple fields, I should be using the MultipleFieldsQueryParser class, and simply pass a String array containing the fields I wish to search over. My follow-up question to you is this: How do

RE: Index optimization ...

2008-07-28 Thread John Griffin
Use IndexWriter.setRAMBufferSizeMB(double mb) and you won't have to sacrifice anything. It defaults to 16.0 MB so depending on the size of your index you may want to make it larger. Do some testing at various values to see where the sweet spot is. John G. -Original Message- From: Dragon

Using lucene as a database... good idea or bad idea?

2008-07-28 Thread John Evans
Hi All, I have successfully used Lucene in the "tradtiional" way to provide full-text search for various websites. Now I am tasked with developing a data-store to back a web crawler. The crawler can be configured to retrieve arbitrary fields from arbitrary pages, so the result is that each docum

How to get unique values for field1 where search is on field2?

2008-07-28 Thread senthil kumaran
Hi, I've indexed Book Title,Author Name,Contents and some other fields. Previously I gave option to search string in any of those fields and I displayed results from getting fields "Title","Author Name","Contents" from hits resulted docs. Now I want to display "Title" & "Author Name" list w

Index optimization ...

2008-07-28 Thread Dragon Fly
I'd like to shorten the time it takes to optimize my index and am willing to sacrifice search and indexing performance. Which parameters (e.g. merge factor) should I change? Thank you. _ Stay in touch when you're away with Windows

Re: Creating an index from an XML file using Lucene in Java

2008-07-28 Thread syedfa
Hi Karsten: I have another follow-up question for you. Once I create the index the way you suggested, how would I modify my code to search it? At present, I have the following code: Analyzer analyser = new StandardAnalyzer(); Query parser=new QueryParser("LINES", analyser).pa

RE: Fastest way to get just the "bits" of matching documents

2008-07-28 Thread Robert Stewart
BTW, we use Lucene .NET not Java currently, so version is 1.9. Unfortunately we don’t have "setAllowDocsOutOfOrder" but do have "useScorer14" which is almost the same thing for some queries. I did not see much improvement and for other queries it was slower. We are stuck on 1.9 due some stabil

Re: Creating an index from an XML file using Lucene in Java

2008-07-28 Thread syedfa
Thanks Karsten for your reply. I will implement your solution tonight, however I did have a quick follow up question. I understand how you are implementing the solution for the "SCENE-COMMENTARY" tag, however because at present I am working with the "LINES" tag, shouldn't I continue using that i

Re: Lucene Search Error: Java.io.IOException: Bad file descriptor

2008-07-28 Thread Michael McCandless
The description here sounds exactly like what we were seeing before LUCENE-669 was fixed -- from his writeup it doesn't look like he tested with Lucene 2.2 to see if the problem went away. I think it very well may. That said, as a precaution, maybe we should no longer call close() on o

Re: Lucene performance issues..

2008-07-28 Thread Michael McCandless
Perhaps one thing to try is a partial optimize (IndexWriter.optimize(int maxNumSegments)). It makes optimize faster, but searches may run slower than a full optimize. EG, optimize(5) will reduce index to <= 5 segments. Mike Stu Hood wrote: Also, keep in mind that optimization is a very

Re: How to use lucene for high search performance ?

2008-07-28 Thread Michael McCandless
Yes you can, and that should be fast. Another thing to try is an SSD -- look at the "Lucene performance issues" thread on java-user. Mike On Jul 27, 2008, at 11:54 PM, 王建新 wrote: Thanks a lot. I have an idea, Can I use lucene on a 64bits VM? In the condition, I can load all index files t

Re: Query in IndexWriter.deleteDocuments(Term term)

2008-07-28 Thread Michael McCandless
Ahh gotchya, OK. Mike Ajay Garg wrote: Thanks Mike. Yes, I know, 2.3.2 doesn't have commit(). That's why, I asked whether commit = close + new IndexWriter, because then I can write a commit() method, encapslating close() + new IndexWriter. Thanks a ton for the prompt replies.. Ajay Gar

Re: Creating an index from an XML file using Lucene in Java

2008-07-28 Thread Karsten F.
Hi Fayyaz, again, this is about SAX-Handler not about lucene. My understanding of what you want: 1. one lucene document for each SPEECH-Element (already implemented) 2. one lucene document for each SCENE-COMMENTARY-Element (not implemented yet). correct? If yes, you can write i

Re: Lucene performance issues..

2008-07-28 Thread Toke Eskildsen
On Sun, 2008-07-27 at 21:38 +0100, Mazhar Lateef wrote: > * email searching > o We are creating very large indexes for emails we are > processing, the size is upto +150GB for indexes only (not > including data content), this we thought would improve > search

Re: Lucene performance issues..

2008-07-28 Thread ನಾಗೇಶ್ ಸುಬ್ರಹ್ಮಣ್ಯ (Nagesh S)
Not an answer to your question. But, have you tried IBM's OmniFind Personal Email Search ? Excerpt from their site : Simple keyword or text search is not always effective for quickly finding what you need. IBM(R) has gone beyond keywords by inventing a fast and accurate semantic search system for