Re: Using lucene as a database... good idea or bad idea?

ನಾಗೇಶ್ ಸುಬ್ರಹ್ಮಣ್ಯ (Nagesh S) Tue, 29 Jul 2008 07:02:45 -0700

Hi Ian,
Yes, I see that we are discussing an "option" here.

But, as I said before (the three parts to search-based solution), I do not
know (but, would like to know) how Lucene (java only - not Nutch, Solr,
etc.) can be used as a datastore.


Basically, I am not able to connect "database" and Lucene java. :)

Nagesh


On Tue, Jul 29, 2008 at 6:51 PM, Ian Lea <[EMAIL PROTECTED]> wrote:

> I don't think that anyone in this thread has said "should", just
> "could" - it is a valid option (IMHO).  Personally, I use it as a
> store for lucene related data because I know and like and trust it, it
> is already there for this project so no need to introduce another
> software dependency, and because it is blindingly fast.
>
>
> --
> Ian.
>
>
> On Tue, Jul 29, 2008 at 1:43 PM, ನಾಗೇಶ್ ಸುಬ್ರಹ್ಮಣ್ಯ (Nagesh S)
> <[EMAIL PROTECTED]> wrote:
> > The way I see it, search solutions (on whatever scale) have three
> components
> > - data aggregation, indexing/searching and presentation of results. I
> > thought, Lucene did the second part only.
> >
> > So, I do not quite follow, why should Lucene be used for datastore ?
> >
> > Nagesh
> >
> > On Tue, Jul 29, 2008 at 6:01 PM, Grant Ingersoll <[EMAIL PROTECTED]
> >wrote:
> >
> >> I think the answer is it can be done and probably quite well.  I also
> think
> >> it's informative that Nutch does not use Lucene for this function, as I
> >> understand it, but that shouldn't stop you either.  You might also have
> a
> >> look at Apache Jackrabbit, which uses Lucene underneath as a content
> >> repository.
> >>
> >> -Grant
> >>
> >>
> >> On Jul 29, 2008, at 5:34 AM, Ganesh - yahoo wrote:
> >>
> >>  Hello all,
> >>>
> >>> I am also interested in this. I want to archive the content of the
> >>> document using Lucene.
> >>>
> >>> Is it a good idea to use Lucene as storage engine?
> >>>
> >>> Regards
> >>> Ganesh
> >>>
> >>> ----- Original Message ----- From: "Ian Lea" <[EMAIL PROTECTED]>
> >>> To: <java-user@lucene.apache.org>
> >>> Sent: Tuesday, July 29, 2008 2:18 PM
> >>> Subject: Re: Using lucene as a database... good idea or bad idea?
> >>>
> >>>
> >>>  John
> >>>>
> >>>>
> >>>> I think it's a great idea, and do exactly this to store 5 million+
> >>>> documents with info that it takes way too long to get out of our
> >>>> Oracle database (think days).  Not as many docs as you are talking
> >>>> about, and less data for each doc, but I wouldn't have any concerns
> >>>> about scaling.  There are certainly lucene indexes out there bigger
> >>>> than what you propose.  You can compress the stored data to save some
> >>>> space.  Run times for optimization might get interesting but see
> >>>> recent threads for suggestions on that.  And since you are not too
> >>>> concerned about performance you may not need to optimize much, or even
> >>>> at all.
> >>>>
> >>>> Of course you need to remember that this is not a DBMS solution in the
> >>>> sense of transactions, recovery, etc. but I'm sure you are already
> >>>> aware of that.
> >>>>
> >>>>
> >>>> --
> >>>> Ian.
> >>>>
> >>>>
> >>>> On Tue, Jul 29, 2008 at 2:53 AM, John Evans <[EMAIL PROTECTED]> wrote:
> >>>>
> >>>>> Hi All,
> >>>>>
> >>>>> I have successfully used Lucene in the "tradtiional" way to provide
> >>>>> full-text search for various websites.  Now I am tasked with
> developing
> >>>>> a
> >>>>> data-store to back a web crawler.  The crawler can be configured to
> >>>>> retrieve
> >>>>> arbitrary fields from arbitrary pages, so the result is that each
> >>>>> document
> >>>>> may have a random assortment of fields.  It seems like Lucene may be
> a
> >>>>> natural fit for this scenario since you can obviously add arbitrary
> >>>>> fields
> >>>>> to each document and you can store the actually data in the database.
> >>>>> I've
> >>>>> done some research to make sure that it would meet all of our
> individual
> >>>>> requirements (that we can iterate over documents, update
> >>>>> (delete/replace)
> >>>>> documents, etc.) and everything looks good.  I've also seen a couple
> of
> >>>>> references around the net to other people trying similar things...
> >>>>> however,
> >>>>> I know it's not meant to be used this way, so I thought I would post
> >>>>> here
> >>>>> and ask for guidance?  Has anyone done something similar?  Is there
> any
> >>>>> specific reason to think this is a bad idea?
> >>>>>
> >>>>> The one thing that I am least certain about his how well it will
> scale.
> >>>>> We
> >>>>> may reach the point where we have tens of millions of documents and a
> >>>>> high
> >>>>> percentage of those documents may be relatively large (10k-50k each).
> >>>>>  We
> >>>>> actually would NOT be expecting/needing Lucene's normal extreme fast
> >>>>> text
> >>>>> search times for this, but we would need reasonable times for adding
> new
> >>>>> documents to the index, retrieving documents by ID (for iterating
> over
> >>>>> all
> >>>>> documents), optimizing the index after a series of changes, etc.
> >>>>>
> >>>>> Any advice/input/theories anyone can contribute would be greatly
> >>>>> appreciated.
> >>>>>
> >>>>> Thanks,
> >>>>> -
> >>>>> John
> >>>>>
> >>>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: [EMAIL PROTECTED]
> >>>> For additional commands, e-mail: [EMAIL PROTECTED]
> >>>>
> >>>
> >>> Send instant messages to your online friends
> >>> http://in.messenger.yahoo.com
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: [EMAIL PROTECTED]
> >>> For additional commands, e-mail: [EMAIL PROTECTED]
> >>>
> >>>
> >> --------------------------
> >> Grant Ingersoll
> >> http://www.lucidimagination.com
> >>
> >> Lucene Helpful Hints:
> >> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> >> http://wiki.apache.org/lucene-java/LuceneFAQ
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [EMAIL PROTECTED]
> >> For additional commands, e-mail: [EMAIL PROTECTED]
> >>
> >>
> >
>

Re: Using lucene as a database... good idea or bad idea?

Reply via email to