Hi, thanks for your answers, you really helped me make the right decision. I have now a fully denormalized second index, which is way easier to handle than the attempt I made before that mimicked the DB schema, and I don't have any speed problems.
It seems Lucene's mailinglist is just as great as the code. :-) Regards, Michael > -----Ursprüngliche Nachricht----- > Von: Erick Erickson [mailto:[EMAIL PROTECTED] > Gesendet: Donnerstag, 28. Juni 2007 20:45 > An: java-user@lucene.apache.org > Betreff: Re: Searching over multiple indexes with 1:m relationship > > > Chris is spot-on. Your data set is so small that I wouldn't > worry about > speed unless and until you have proof that it's a problem. > The complexity > you'll introduce by having multiple indexes just won't be worth it. > > In your case, following Chris's advice and de-normalizing the > data would > be the first you should try. > > Erick. > > On 6/28/07, Michael Böckling <[EMAIL PROTECTED]> wrote: > > > > Hi Erickson, > > > > thanks for your reply. > > > > Of course you are right that its a bit insane to mimic a > database-schema > > with indices, but thats how it is. The primary index is > already in use, > > the > > extended requirements came later. > > > > The Index isn't really that big, the primary one has 2-3 MB > of data, I > > don't > > know yet how big the secondary one will be, but probably > less than 20 > > Megs. > > The idea was that most searches will only need the first > index, it is only > > by using an extended search form that the secondary index > is queried. > > Keeping the first index small should help with performance, > where the main > > load is handled. > > > > The number of primary results will often be less than 200, typically > > around > > 20 I guess, so its not that big of a deal to iterate through them. > > > > Regards, > > > > Michael > > > > > > > -----Ursprüngliche Nachricht----- > > > Von: Erick Erickson [mailto:[EMAIL PROTECTED] > > > Gesendet: Donnerstag, 28. Juni 2007 16:09 > > > An: java-user@lucene.apache.org > > > Betreff: Re: Searching over multiple indexes with 1:m relationship > > > > > > > > > I do have an off-the-wall question.. Why have two indexes? There > > > are, of course, good reasons, but they're things like > size and speed. > > > > > > Where I'm going here is that Lucene does NOT require that all > > > documents have the same fields. So it's perfectly > reasonable to index > > > heterogeneous data (or differing forms of the same data) in a > > > single index. > > > This may not fit your requirements, but I thought I'd mention it. > > > > > > That said, it really doesn't bear on your question since > > > you'd really have > > > two logical indexes in the same physical index. Although > > > maybe it does. > > > If all the data were in one index, then perhaps you could do > > > exactly one > > > search instead. > > > > > > I'm always leery of using an index to mimic what looks > like database > > > functionality. That often means that you either should > actually use a > > > database for the database-like parts or get much more clever > > > in your index > > > so you don't need what are essentially joins. > > > > > > All that said, a lot depends on the data set size. If > your first query > > > results > > > in, say, 100 documents (pks) that you need to use for your > > > second query, > > > it probably doesn't matter whether you do a lot of manual > > > processing. If the > > > first query results in 1,000,000 pks, then it does.... > > > > > > So how much data are you talking about? Even the single-index idea > > > depends upon whether we're talking a couple of G index size of a > > > couple of T... > > > > > > Best > > > Erick > > > > > > > > > > > > > > > On 6/28/07, Michael Böckling <[EMAIL PROTECTED]> wrote: > > > > > > > > Hi folks! > > > > > > > > I know there is a MultiSearcher for searching over multiple > > > indices, but > > > > my > > > > requirement is a bit special. > > > > I have two indices whose documents have a 1:m relationship. > > > Most queries > > > > will only use the primary index, but some will have to look > > > for detailed > > > > information in the secondary index (the index fields > are of course > > > > different). > > > > > > > > What I plan to do: > > > > - first get the results from the primary index > > > > - then use the pk of the found documents and the > additional search > > > > constraints to search in the secondary index > > > > - discard any primary results that did not match in the > > > secondary index > > > > > > > > Is this ok, or am I completely nuts by doing that? Is > there a better > > > > alternative? > > > > > > > > Thanks for any clues! > > > > > > > > Michael > > > > > > > > > > > > -- > > > > Michael Böckling > > > > Java Engineer > > > > dmc digital media center GmbH > > > > Rommelstraße 11 > > > > 70376 Stuttgart (Germany) > > > > Telefon: +49 711 601747-0 > > > > Telefax: +49 711 601747-141 > > > > E-Mail: [EMAIL PROTECTED] > > > > Internet: www.dmc.de > > > > > > > > Handelsregister: AG Stuttgart HRB 18974 > > > > Geschäftsführer: Andreas Magg, Daniel Rebhorn, Andreas Schwend > > > > > > > > --------------------------------------------- > > > > Besseres E-Business. > > > > dmc ist die kreative Vernetzung von Agentur, Systemhaus und > > > Service. Seit > > > > über 10 Jahren entwickeln und realisieren wir > zukunftweisende und > > > > erfolgreiche E-Business-Lösungen. Zu unseren langjährigen > > > Kunden zählen > > > > neckermann.de, Kodak und Telekom Training. > > > > > > > > dmc auf Platz 8 im aktuellen New Media Service Ranking. > > > > Als inhabergeführte und netzwerkunabhängige Agentur gehören > > > wir mit einem > > > > Umsatz von 13,50 Mio. Euro zu den Top 10 der > > > erfolgreichsten New Media > > > > Dienstleister in Deutschland. > > > > > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > > For additional commands, e-mail: > [EMAIL PROTECTED] > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]