Re: Using lucene as a database... good idea or bad idea?

Ian Lea Tue, 29 Jul 2008 06:22:28 -0700

I don't think that anyone in this thread has said "should", just
"could" - it is a valid option (IMHO).  Personally, I use it as a
store for lucene related data because I know and like and trust it, it
is already there for this project so no need to introduce another
software dependency, and because it is blindingly fast.



--
Ian.


On Tue, Jul 29, 2008 at 1:43 PM, ನಾಗೇಶ್ ಸುಬ್ರಹ್ಮಣ್ಯ (Nagesh S)
<[EMAIL PROTECTED]> wrote:
> The way I see it, search solutions (on whatever scale) have three components
> - data aggregation, indexing/searching and presentation of results. I
> thought, Lucene did the second part only.
>
> So, I do not quite follow, why should Lucene be used for datastore ?
>
> Nagesh
>
> On Tue, Jul 29, 2008 at 6:01 PM, Grant Ingersoll <[EMAIL PROTECTED]>wrote:
>
>> I think the answer is it can be done and probably quite well.  I also think
>> it's informative that Nutch does not use Lucene for this function, as I
>> understand it, but that shouldn't stop you either.  You might also have a
>> look at Apache Jackrabbit, which uses Lucene underneath as a content
>> repository.
>>
>> -Grant
>>
>>
>> On Jul 29, 2008, at 5:34 AM, Ganesh - yahoo wrote:
>>
>>  Hello all,
>>>
>>> I am also interested in this. I want to archive the content of the
>>> document using Lucene.
>>>
>>> Is it a good idea to use Lucene as storage engine?
>>>
>>> Regards
>>> Ganesh
>>>
>>> ----- Original Message ----- From: "Ian Lea" <[EMAIL PROTECTED]>
>>> To: <java-user@lucene.apache.org>
>>> Sent: Tuesday, July 29, 2008 2:18 PM
>>> Subject: Re: Using lucene as a database... good idea or bad idea?
>>>
>>>
>>>  John
>>>>
>>>>
>>>> I think it's a great idea, and do exactly this to store 5 million+
>>>> documents with info that it takes way too long to get out of our
>>>> Oracle database (think days).  Not as many docs as you are talking
>>>> about, and less data for each doc, but I wouldn't have any concerns
>>>> about scaling.  There are certainly lucene indexes out there bigger
>>>> than what you propose.  You can compress the stored data to save some
>>>> space.  Run times for optimization might get interesting but see
>>>> recent threads for suggestions on that.  And since you are not too
>>>> concerned about performance you may not need to optimize much, or even
>>>> at all.
>>>>
>>>> Of course you need to remember that this is not a DBMS solution in the
>>>> sense of transactions, recovery, etc. but I'm sure you are already
>>>> aware of that.
>>>>
>>>>
>>>> --
>>>> Ian.
>>>>
>>>>
>>>> On Tue, Jul 29, 2008 at 2:53 AM, John Evans <[EMAIL PROTECTED]> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> I have successfully used Lucene in the "tradtiional" way to provide
>>>>> full-text search for various websites.  Now I am tasked with developing
>>>>> a
>>>>> data-store to back a web crawler.  The crawler can be configured to
>>>>> retrieve
>>>>> arbitrary fields from arbitrary pages, so the result is that each
>>>>> document
>>>>> may have a random assortment of fields.  It seems like Lucene may be a
>>>>> natural fit for this scenario since you can obviously add arbitrary
>>>>> fields
>>>>> to each document and you can store the actually data in the database.
>>>>> I've
>>>>> done some research to make sure that it would meet all of our individual
>>>>> requirements (that we can iterate over documents, update
>>>>> (delete/replace)
>>>>> documents, etc.) and everything looks good.  I've also seen a couple of
>>>>> references around the net to other people trying similar things...
>>>>> however,
>>>>> I know it's not meant to be used this way, so I thought I would post
>>>>> here
>>>>> and ask for guidance?  Has anyone done something similar?  Is there any
>>>>> specific reason to think this is a bad idea?
>>>>>
>>>>> The one thing that I am least certain about his how well it will scale.
>>>>> We
>>>>> may reach the point where we have tens of millions of documents and a
>>>>> high
>>>>> percentage of those documents may be relatively large (10k-50k each).
>>>>>  We
>>>>> actually would NOT be expecting/needing Lucene's normal extreme fast
>>>>> text
>>>>> search times for this, but we would need reasonable times for adding new
>>>>> documents to the index, retrieving documents by ID (for iterating over
>>>>> all
>>>>> documents), optimizing the index after a series of changes, etc.
>>>>>
>>>>> Any advice/input/theories anyone can contribute would be greatly
>>>>> appreciated.
>>>>>
>>>>> Thanks,
>>>>> -
>>>>> John
>>>>>
>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>>>> For additional commands, e-mail: [EMAIL PROTECTED]
>>>>
>>>
>>> Send instant messages to your online friends
>>> http://in.messenger.yahoo.com
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>>> For additional commands, e-mail: [EMAIL PROTECTED]
>>>
>>>
>> --------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com
>>
>> Lucene Helpful Hints:
>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
>

Re: Using lucene as a database... good idea or bad idea?

Reply via email to