It surely is possible. AFAIK, LinkedIn use lucene to store some data.
But, Lucene index in a sense is similar to database index. Both are data
structures for a specialized and limited query execution path.
So this depends on your applications' query, and how you create the lucene
index. The norma
Yeah.. we do the same thing here for indexes of up to 57M documents
(rows), and that's just one part of our implementation.
It takes quite a bit of.. wrangling to use lucene in this manner.. but
we've found it to be utterly worthwhile.
Matt
Ian Lea wrote:
John
I think it's a great idea,
I do this with uplib (http://uplib.parc.com/) with fair success.
Originally I thought I'd need Lucene plus a relational database to
store metadata about the documents for metadata searches. So far,
though, I've been able to store the metadata in Lucene and use the
same Lucene DB for both metadata
"Don't connect "database" (i.e. SQL, transactions, etc.) and Lucene.
Connect data storage with simple, fast lookup and Lucene."
Thanks, Grant for the clarification. I see now.
Nagesh
On Tue, Jul 29, 2008 at 7:55 PM, Grant Ingersoll <[EMAIL PROTECTED]>wrote:
> Don't connect "database" (i.e. SQL,
Don't connect "database" (i.e. SQL, transactions, etc.) and Lucene.
Connect data storage with simple, fast lookup and Lucene.
One field is the key (i.e. the filename) the other field is a binary,
stored Field containing the contents of the file. Of course, there
are other ways of slicing
Look at Compass wrapper for Lucene...
Regards,
Aravind R Yarram
Enabling Technologies
Equifax Information Services LLC
1525 Windward Concourse, J42E
Alpharetta, GA 30005
desk: 770 740 6951
email: [EMAIL PROTECTED]
"ನಾಗೇಶ್ ಸುಬ್ರಹ್ಮಣ್ಯ (Nagesh S)" <[EMAIL PROTECTED]>
07/29/2008 10:02 AM
Please
Hi Ian,
Yes, I see that we are discussing an "option" here.
But, as I said before (the three parts to search-based solution), I do not
know (but, would like to know) how Lucene (java only - not Nutch, Solr,
etc.) can be used as a datastore.
Basically, I am not able to connect "database" and Lucen
Agreed, no one is saying should. Additionally, Lucene can be faster
for a number of things like storage when databases are overkill (i.e.
you don't need transactions, complex joins, etc.) After all, even the
lookup of a file, can be viewed as a "search", even if it is just for
a single un
I don't think that anyone in this thread has said "should", just
"could" - it is a valid option (IMHO). Personally, I use it as a
store for lucene related data because I know and like and trust it, it
is already there for this project so no need to introduce another
software dependency, and becaus
The way I see it, search solutions (on whatever scale) have three components
- data aggregation, indexing/searching and presentation of results. I
thought, Lucene did the second part only.
So, I do not quite follow, why should Lucene be used for datastore ?
Nagesh
On Tue, Jul 29, 2008 at 6:01 PM
I think the answer is it can be done and probably quite well. I also
think it's informative that Nutch does not use Lucene for this
function, as I understand it, but that shouldn't stop you either. You
might also have a look at Apache Jackrabbit, which uses Lucene
underneath as a content
John Griffin:
> Use IndexWriter.setRAMBufferSizeMB(double mb) and you won't have to
> sacrifice anything. It defaults to 16.0 MB so depending on the size of your
> index you may want to make it larger. Do some testing at various values to
> see where the sweet spot is.
>
Also, have a look at
htt
Hello all,
I am also interested in this. I want to archive the content of the document
using Lucene.
Is it a good idea to use Lucene as storage engine?
Regards
Ganesh
- Original Message -
From: "Ian Lea" <[EMAIL PROTECTED]>
To:
Sent: Tuesday, July 29, 2008 2:18 PM
Subject: Re: Usi
Try IndexWriter.optimize(int maxNumSegments)
On Mon, Jul 28, 2008 at 11:30 PM, Dragon Fly <[EMAIL PROTECTED]>wrote:
> I'd like to shorten the time it takes to optimize my index and am willing
> to sacrifice search and indexing performance. Which parameters (e.g. merge
> factor) should I change?
John
I think it's a great idea, and do exactly this to store 5 million+
documents with info that it takes way too long to get out of our
Oracle database (think days). Not as many docs as you are talking
about, and less data for each doc, but I wouldn't have any concerns
about scaling. There are
15 matches
Mail list logo