I don't think that anyone in this thread has said "should", just "could" - it is a valid option (IMHO). Personally, I use it as a store for lucene related data because I know and like and trust it, it is already there for this project so no need to introduce another software dependency, and because it is blindingly fast.
-- Ian. On Tue, Jul 29, 2008 at 1:43 PM, ನಾಗೇಶ್ ಸುಬ್ರಹ್ಮಣ್ಯ (Nagesh S) <[EMAIL PROTECTED]> wrote: > The way I see it, search solutions (on whatever scale) have three components > - data aggregation, indexing/searching and presentation of results. I > thought, Lucene did the second part only. > > So, I do not quite follow, why should Lucene be used for datastore ? > > Nagesh > > On Tue, Jul 29, 2008 at 6:01 PM, Grant Ingersoll <[EMAIL PROTECTED]>wrote: > >> I think the answer is it can be done and probably quite well. I also think >> it's informative that Nutch does not use Lucene for this function, as I >> understand it, but that shouldn't stop you either. You might also have a >> look at Apache Jackrabbit, which uses Lucene underneath as a content >> repository. >> >> -Grant >> >> >> On Jul 29, 2008, at 5:34 AM, Ganesh - yahoo wrote: >> >> Hello all, >>> >>> I am also interested in this. I want to archive the content of the >>> document using Lucene. >>> >>> Is it a good idea to use Lucene as storage engine? >>> >>> Regards >>> Ganesh >>> >>> ----- Original Message ----- From: "Ian Lea" <[EMAIL PROTECTED]> >>> To: <java-user@lucene.apache.org> >>> Sent: Tuesday, July 29, 2008 2:18 PM >>> Subject: Re: Using lucene as a database... good idea or bad idea? >>> >>> >>> John >>>> >>>> >>>> I think it's a great idea, and do exactly this to store 5 million+ >>>> documents with info that it takes way too long to get out of our >>>> Oracle database (think days). Not as many docs as you are talking >>>> about, and less data for each doc, but I wouldn't have any concerns >>>> about scaling. There are certainly lucene indexes out there bigger >>>> than what you propose. You can compress the stored data to save some >>>> space. Run times for optimization might get interesting but see >>>> recent threads for suggestions on that. And since you are not too >>>> concerned about performance you may not need to optimize much, or even >>>> at all. >>>> >>>> Of course you need to remember that this is not a DBMS solution in the >>>> sense of transactions, recovery, etc. but I'm sure you are already >>>> aware of that. >>>> >>>> >>>> -- >>>> Ian. >>>> >>>> >>>> On Tue, Jul 29, 2008 at 2:53 AM, John Evans <[EMAIL PROTECTED]> wrote: >>>> >>>>> Hi All, >>>>> >>>>> I have successfully used Lucene in the "tradtiional" way to provide >>>>> full-text search for various websites. Now I am tasked with developing >>>>> a >>>>> data-store to back a web crawler. The crawler can be configured to >>>>> retrieve >>>>> arbitrary fields from arbitrary pages, so the result is that each >>>>> document >>>>> may have a random assortment of fields. It seems like Lucene may be a >>>>> natural fit for this scenario since you can obviously add arbitrary >>>>> fields >>>>> to each document and you can store the actually data in the database. >>>>> I've >>>>> done some research to make sure that it would meet all of our individual >>>>> requirements (that we can iterate over documents, update >>>>> (delete/replace) >>>>> documents, etc.) and everything looks good. I've also seen a couple of >>>>> references around the net to other people trying similar things... >>>>> however, >>>>> I know it's not meant to be used this way, so I thought I would post >>>>> here >>>>> and ask for guidance? Has anyone done something similar? Is there any >>>>> specific reason to think this is a bad idea? >>>>> >>>>> The one thing that I am least certain about his how well it will scale. >>>>> We >>>>> may reach the point where we have tens of millions of documents and a >>>>> high >>>>> percentage of those documents may be relatively large (10k-50k each). >>>>> We >>>>> actually would NOT be expecting/needing Lucene's normal extreme fast >>>>> text >>>>> search times for this, but we would need reasonable times for adding new >>>>> documents to the index, retrieving documents by ID (for iterating over >>>>> all >>>>> documents), optimizing the index after a series of changes, etc. >>>>> >>>>> Any advice/input/theories anyone can contribute would be greatly >>>>> appreciated. >>>>> >>>>> Thanks, >>>>> - >>>>> John >>>>> >>>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>>> For additional commands, e-mail: [EMAIL PROTECTED] >>>> >>> >>> Send instant messages to your online friends >>> http://in.messenger.yahoo.com >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>> For additional commands, e-mail: [EMAIL PROTECTED] >>> >>> >> -------------------------- >> Grant Ingersoll >> http://www.lucidimagination.com >> >> Lucene Helpful Hints: >> http://wiki.apache.org/lucene-java/BasicsOfPerformance >> http://wiki.apache.org/lucene-java/LuceneFAQ >> >> >> >> >> >> >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> >