Re: 2.3.2 Indexing Performance

2008-10-01 Thread Michael McCandless
Awesome! Thanks for following up. Mike Gary Moore wrote: Finally got back to this. The great bulk of the time is spent parsing/tokenizing. So, using 10 threads parsing/analyzing the 4.5M docs and feeding them to an IndexWriter took 106 minutes including a final optimization. The ind

bunch of newbie queries, PS

2008-10-01 Thread rolarenfan
OK, after googling around for a while, I found this: http://wooga.drbacchus.com/lucene-and-documentation (alas, I agree) and then eventually I realized that the download web-page directory has a link to the "archive" where all the old stuff is (probably this is common and obvious, but I kept

Re: 2.3.2 Indexing Performance

2008-10-01 Thread Gary Moore
Finally got back to this. The great bulk of the time is spent parsing/tokenizing. So, using 10 threads parsing/analyzing the 4.5M docs and feeding them to an IndexWriter took 106 minutes including a final optimization. The index is 5.6 GB. I'm tempted to try multiple indexing threads but

Re: Lucene vs. Database

2008-10-01 Thread Petite Abeille
On Oct 1, 2008, at 9:43 AM, agatone wrote: I'm working on a project that has big database in the background (some tables have about 150 rows). We decided to use Lucene for "faster" search. Our search works similar as all searches: you write search string, get list of hits with detail link

Re: Lucene vs. Database

2008-10-01 Thread Marcelo Ochoa
Mathieu: > Crawling a DB is not a good idea. Indexing while writing/deleting is > clever. These operations also consume network traffic in architectures like Solr WS. Also there is a waste of network traffic when a query is filtered against relational data (slides 15 and 18 of Google presentati

Re: Lucene vs. Database

2008-10-01 Thread Matthew Hall
Another thing you could consider is that rather than meshing all this data into a single index, logically break out the data you need for searching into one index, and the data you need for display into another index. This is the technique we use here and its been wildly successful for us, as

Re: Lucene vs. Database

2008-10-01 Thread mathieu
Crawling a DB is not a good idea. Indexing while writing/deleting is clever. Doing it inside the DB is a solution. Java users like ORM. Compass plug Lucene indexation in the ORM's transaction. If it's wrote or deleted, Lucene is aware. Compass is opensource. M. On Wed, 1 Oct 2008 09:12:41 -0300,

case studies

2008-10-01 Thread Erik Hatcher
Dear Lucene and Solr users - I'm presenting Lucene/Solr Case Studies at ApacheCon in a month: I would like to feature implementations by YOU. The thing is, my slides are due this Friday, so time is short to collect this info. If you have

Re: Lucene vs. Database

2008-10-01 Thread Marcelo Ochoa
Hi Zoran: One of the biggest issues with Lucene DB integration is the network traffic consumed as consequence of indexing or updating operation, apart from transactionalbilty which can be relaxed in some application. During our Oracle Open World presentation we present some of these issues comp

Re: Lucene vs. Database

2008-10-01 Thread mathieu
Have a look at Compass : http://www.compass-project.org/ It's one of the easyest way to mix db and lucene. M. On Wed, 1 Oct 2008 00:43:57 -0700 (PDT), agatone <[EMAIL PROTECTED]> wrote: > > Hi, > I asked this question already on "lucene-general" list but also got > advised > to ask here too. >

Re: Lucene vs. Database

2008-10-01 Thread Karsten F.
Hi agatone, I agree with markharw00 that highlighting is the main reason to store fields in lucene. I want to remind Sascha Fahl that the stored field in lucene are not inside the inverted index-structure. The implemention of stored fields is very simple: A (.fdt)-file with the pairs "field-name

Re: Lucene vs. Database

2008-10-01 Thread markharw00d
Pros of keeping content only in the database * Need only one stored copy of data (saved disk space) Pros of storing copy of content in Lucene: * A match is more easily explained If you collapse multiple DB fields into a single searchable field e.g. customer first name and surname database fiel

Re: Lucene vs. Database

2008-10-01 Thread Chris Lu
Since you have a lot of data, it wouldn't be wise to put those extra data into Lucene index for storage. You will need to keep the data in sync, and update those extra data in lucene index unnecessarily. Lucene index is just like B-Tree index in database. It's just an auxiliary data structure, and

Re: Lucene vs. Database

2008-10-01 Thread Sascha Fahl
Hi, there is a big conceptual difference. Lucene is working with an inverted index what means that you have a list of words (terms) having a list of all documents that contain these word (term). Databases usually are working with normal indexes what means you have a document describing the w

Lucene vs. Database

2008-10-01 Thread agatone
Hi, I asked this question already on "lucene-general" list but also got advised to ask here too. I'm working on a project that has big database in the background (some tables have about 150 rows). We decided to use Lucene for "faster" search. Our search works similar as all searches: you wri