Re: Using lucene as a database... good idea or bad idea?

2008-07-29 Thread Chris Lu
It surely is possible. AFAIK, LinkedIn use lucene to store some data. But, Lucene index in a sense is similar to database index. Both are data structures for a specialized and limited query execution path. So this depends on your applications' query, and how you create the lucene index. The norma

Re: Using lucene as a database... good idea or bad idea?

2008-07-29 Thread Matthew Hall
Yeah.. we do the same thing here for indexes of up to 57M documents (rows), and that's just one part of our implementation. It takes quite a bit of.. wrangling to use lucene in this manner.. but we've found it to be utterly worthwhile. Matt Ian Lea wrote: John I think it's a great idea,

Re: Using lucene as a database... good idea or bad idea?

2008-07-29 Thread Bill Janssen
I do this with uplib (http://uplib.parc.com/) with fair success. Originally I thought I'd need Lucene plus a relational database to store metadata about the documents for metadata searches. So far, though, I've been able to store the metadata in Lucene and use the same Lucene DB for both metadata

Re: Using lucene as a database... good idea or bad idea?

2008-07-29 Thread ನಾಗೇಶ್ ಸುಬ್ರಹ್ಮಣ್ಯ (Nagesh S)
"Don't connect "database" (i.e. SQL, transactions, etc.) and Lucene. Connect data storage with simple, fast lookup and Lucene." Thanks, Grant for the clarification. I see now. Nagesh On Tue, Jul 29, 2008 at 7:55 PM, Grant Ingersoll <[EMAIL PROTECTED]>wrote: > Don't connect "database" (i.e. SQL,

Re: Using lucene as a database... good idea or bad idea?

2008-07-29 Thread Grant Ingersoll
Don't connect "database" (i.e. SQL, transactions, etc.) and Lucene. Connect data storage with simple, fast lookup and Lucene. One field is the key (i.e. the filename) the other field is a binary, stored Field containing the contents of the file. Of course, there are other ways of slicing

Re: Using lucene as a database... good idea or bad idea?

2008-07-29 Thread Aravind . Yarram
Look at Compass wrapper for Lucene... Regards, Aravind R Yarram Enabling Technologies Equifax Information Services LLC 1525 Windward Concourse, J42E Alpharetta, GA 30005 desk: 770 740 6951 email: [EMAIL PROTECTED] "ನಾಗೇಶ್ ಸುಬ್ರಹ್ಮಣ್ಯ (Nagesh S)" <[EMAIL PROTECTED]> 07/29/2008 10:02 AM Please

Re: Using lucene as a database... good idea or bad idea?

2008-07-29 Thread ನಾಗೇಶ್ ಸುಬ್ರಹ್ಮಣ್ಯ (Nagesh S)
Hi Ian, Yes, I see that we are discussing an "option" here. But, as I said before (the three parts to search-based solution), I do not know (but, would like to know) how Lucene (java only - not Nutch, Solr, etc.) can be used as a datastore. Basically, I am not able to connect "database" and Lucen

Re: Using lucene as a database... good idea or bad idea?

2008-07-29 Thread Grant Ingersoll
Agreed, no one is saying should. Additionally, Lucene can be faster for a number of things like storage when databases are overkill (i.e. you don't need transactions, complex joins, etc.) After all, even the lookup of a file, can be viewed as a "search", even if it is just for a single un

Re: Using lucene as a database... good idea or bad idea?

2008-07-29 Thread Ian Lea
I don't think that anyone in this thread has said "should", just "could" - it is a valid option (IMHO). Personally, I use it as a store for lucene related data because I know and like and trust it, it is already there for this project so no need to introduce another software dependency, and becaus

Re: Using lucene as a database... good idea or bad idea?

2008-07-29 Thread ನಾಗೇಶ್ ಸುಬ್ರಹ್ಮಣ್ಯ (Nagesh S)
The way I see it, search solutions (on whatever scale) have three components - data aggregation, indexing/searching and presentation of results. I thought, Lucene did the second part only. So, I do not quite follow, why should Lucene be used for datastore ? Nagesh On Tue, Jul 29, 2008 at 6:01 PM

Re: Using lucene as a database... good idea or bad idea?

2008-07-29 Thread Grant Ingersoll
I think the answer is it can be done and probably quite well. I also think it's informative that Nutch does not use Lucene for this function, as I understand it, but that shouldn't stop you either. You might also have a look at Apache Jackrabbit, which uses Lucene underneath as a content

Re: Index optimization ...

2008-07-29 Thread Asbjørn A . Fellinghaug
John Griffin: > Use IndexWriter.setRAMBufferSizeMB(double mb) and you won't have to > sacrifice anything. It defaults to 16.0 MB so depending on the size of your > index you may want to make it larger. Do some testing at various values to > see where the sweet spot is. > Also, have a look at htt

Re: Using lucene as a database... good idea or bad idea?

2008-07-29 Thread Ganesh - yahoo
Hello all, I am also interested in this. I want to archive the content of the document using Lucene. Is it a good idea to use Lucene as storage engine? Regards Ganesh - Original Message - From: "Ian Lea" <[EMAIL PROTECTED]> To: Sent: Tuesday, July 29, 2008 2:18 PM Subject: Re: Usi

Re: Index optimization ...

2008-07-29 Thread Shalin Shekhar Mangar
Try IndexWriter.optimize(int maxNumSegments) On Mon, Jul 28, 2008 at 11:30 PM, Dragon Fly <[EMAIL PROTECTED]>wrote: > I'd like to shorten the time it takes to optimize my index and am willing > to sacrifice search and indexing performance. Which parameters (e.g. merge > factor) should I change?

Re: Using lucene as a database... good idea or bad idea?

2008-07-29 Thread Ian Lea
John I think it's a great idea, and do exactly this to store 5 million+ documents with info that it takes way too long to get out of our Oracle database (think days). Not as many docs as you are talking about, and less data for each doc, but I wouldn't have any concerns about scaling. There are