Hi Ian, Yes, I see that we are discussing an "option" here. But, as I said before (the three parts to search-based solution), I do not know (but, would like to know) how Lucene (java only - not Nutch, Solr, etc.) can be used as a datastore.
Basically, I am not able to connect "database" and Lucene java. :) Nagesh On Tue, Jul 29, 2008 at 6:51 PM, Ian Lea <[EMAIL PROTECTED]> wrote: > I don't think that anyone in this thread has said "should", just > "could" - it is a valid option (IMHO). Personally, I use it as a > store for lucene related data because I know and like and trust it, it > is already there for this project so no need to introduce another > software dependency, and because it is blindingly fast. > > > -- > Ian. > > > On Tue, Jul 29, 2008 at 1:43 PM, ನಾಗೇಶ್ ಸುಬ್ರಹ್ಮಣ್ಯ (Nagesh S) > <[EMAIL PROTECTED]> wrote: > > The way I see it, search solutions (on whatever scale) have three > components > > - data aggregation, indexing/searching and presentation of results. I > > thought, Lucene did the second part only. > > > > So, I do not quite follow, why should Lucene be used for datastore ? > > > > Nagesh > > > > On Tue, Jul 29, 2008 at 6:01 PM, Grant Ingersoll <[EMAIL PROTECTED] > >wrote: > > > >> I think the answer is it can be done and probably quite well. I also > think > >> it's informative that Nutch does not use Lucene for this function, as I > >> understand it, but that shouldn't stop you either. You might also have > a > >> look at Apache Jackrabbit, which uses Lucene underneath as a content > >> repository. > >> > >> -Grant > >> > >> > >> On Jul 29, 2008, at 5:34 AM, Ganesh - yahoo wrote: > >> > >> Hello all, > >>> > >>> I am also interested in this. I want to archive the content of the > >>> document using Lucene. > >>> > >>> Is it a good idea to use Lucene as storage engine? > >>> > >>> Regards > >>> Ganesh > >>> > >>> ----- Original Message ----- From: "Ian Lea" <[EMAIL PROTECTED]> > >>> To: <java-user@lucene.apache.org> > >>> Sent: Tuesday, July 29, 2008 2:18 PM > >>> Subject: Re: Using lucene as a database... good idea or bad idea? > >>> > >>> > >>> John > >>>> > >>>> > >>>> I think it's a great idea, and do exactly this to store 5 million+ > >>>> documents with info that it takes way too long to get out of our > >>>> Oracle database (think days). Not as many docs as you are talking > >>>> about, and less data for each doc, but I wouldn't have any concerns > >>>> about scaling. There are certainly lucene indexes out there bigger > >>>> than what you propose. You can compress the stored data to save some > >>>> space. Run times for optimization might get interesting but see > >>>> recent threads for suggestions on that. And since you are not too > >>>> concerned about performance you may not need to optimize much, or even > >>>> at all. > >>>> > >>>> Of course you need to remember that this is not a DBMS solution in the > >>>> sense of transactions, recovery, etc. but I'm sure you are already > >>>> aware of that. > >>>> > >>>> > >>>> -- > >>>> Ian. > >>>> > >>>> > >>>> On Tue, Jul 29, 2008 at 2:53 AM, John Evans <[EMAIL PROTECTED]> wrote: > >>>> > >>>>> Hi All, > >>>>> > >>>>> I have successfully used Lucene in the "tradtiional" way to provide > >>>>> full-text search for various websites. Now I am tasked with > developing > >>>>> a > >>>>> data-store to back a web crawler. The crawler can be configured to > >>>>> retrieve > >>>>> arbitrary fields from arbitrary pages, so the result is that each > >>>>> document > >>>>> may have a random assortment of fields. It seems like Lucene may be > a > >>>>> natural fit for this scenario since you can obviously add arbitrary > >>>>> fields > >>>>> to each document and you can store the actually data in the database. > >>>>> I've > >>>>> done some research to make sure that it would meet all of our > individual > >>>>> requirements (that we can iterate over documents, update > >>>>> (delete/replace) > >>>>> documents, etc.) and everything looks good. I've also seen a couple > of > >>>>> references around the net to other people trying similar things... > >>>>> however, > >>>>> I know it's not meant to be used this way, so I thought I would post > >>>>> here > >>>>> and ask for guidance? Has anyone done something similar? Is there > any > >>>>> specific reason to think this is a bad idea? > >>>>> > >>>>> The one thing that I am least certain about his how well it will > scale. > >>>>> We > >>>>> may reach the point where we have tens of millions of documents and a > >>>>> high > >>>>> percentage of those documents may be relatively large (10k-50k each). > >>>>> We > >>>>> actually would NOT be expecting/needing Lucene's normal extreme fast > >>>>> text > >>>>> search times for this, but we would need reasonable times for adding > new > >>>>> documents to the index, retrieving documents by ID (for iterating > over > >>>>> all > >>>>> documents), optimizing the index after a series of changes, etc. > >>>>> > >>>>> Any advice/input/theories anyone can contribute would be greatly > >>>>> appreciated. > >>>>> > >>>>> Thanks, > >>>>> - > >>>>> John > >>>>> > >>>>> > >>>> --------------------------------------------------------------------- > >>>> To unsubscribe, e-mail: [EMAIL PROTECTED] > >>>> For additional commands, e-mail: [EMAIL PROTECTED] > >>>> > >>> > >>> Send instant messages to your online friends > >>> http://in.messenger.yahoo.com > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: [EMAIL PROTECTED] > >>> For additional commands, e-mail: [EMAIL PROTECTED] > >>> > >>> > >> -------------------------- > >> Grant Ingersoll > >> http://www.lucidimagination.com > >> > >> Lucene Helpful Hints: > >> http://wiki.apache.org/lucene-java/BasicsOfPerformance > >> http://wiki.apache.org/lucene-java/LuceneFAQ > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: [EMAIL PROTECTED] > >> For additional commands, e-mail: [EMAIL PROTECTED] > >> > >> > > >