> -----Original Message----- > From: Chris Lamprecht [mailto:[EMAIL PROTECTED] > Sent: Thursday, April 28, 2005 7:53 PM > > Since the "stored fields" index would basically just be a > database, perhaps this is better served using a traditional > relational database (or even use the OS's file system). > However, a traditional database doesn't know what the docids > are (and docids change over time), -- may be one could store > a single stored field in the "search index", containing a > permanent identifier for the DB lookup.
Is it too late to revisit this? I just ran across it today and conicidentally have been thinking about the same issue. I think you've hit on a great point. Lucene is optimized for searching, while a relational database is optimized for ID lookups, so we're trying the scheme you've described above. We are doing this to strike a right balance between Lucene index size (keeping it small) and performance (keeping db hits to a minimum). We are developing a web paging application, which only displays 10 results at a time. Suppose in Lucene we have WeblogID indexed as a keyword and WeblogContent indexed as UnStored. We completely ignore Lucene's internal docid. The searching process would look like this: 1) Search and get back a Hits collection 2) Loop through the current page of results, grabbing the WeblogIDs for the 10 results to be displayed. 3) Incorporate those WeblogIDs into an sql query like so: SELECT WeblogContent, OtherFields FROM Weblogs, OtherTables WHERE WeblogID IN (ID1, ID2, ID3...). 4) Use the results of this sql query to format the output. An IN query with 10 parameters will be fast, especially if there's an index on those IDs. This works will since I'm paging 10 results at a time; a sql query with 1000 parameters wouldn't be as friendly. You'll also notice the query above does a join; when the data is decoupled from the Lucene index, you're free to grab it from various sources (which may or may not be available at index time). This method gives you that flexibility. Monsur --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]