How large is (large)? What machines are you intending to run this on? In general, though, don't worry about index size until you actually have some numbers to deal with. Solr generally has resource issues based on the number of #unique# terms in an index. So repeating the same thing in a bunch of documents isn't as bad as you'd suppose.
If you *store* fields, certain files in your index will grow linearly, but these aren't the ones that are used for searching. *.fdx, *.fdt, *.fnm in particular will grow. So I'd go ahead and just replicate the data and then monitor your system for, in particular, cache issues (see the admin/stats page). Normalizing your data is tricky in Solr, so don't do it unless it proves necessary IMO... Best Erick On Wed, Mar 9, 2011 at 9:38 AM, sol myr <solmy...@yahoo.com> wrote: > Hi, > > I have several documents that share the same (large) searchable data. > For example, say my Documents represent movies, and 2 movies share the same > actorBiography of Brad Pitt (assuming I want > to search movies by actorBiography words, far-fetched as it might seem): > > > Document1: > - movieName="Benjamin Button" > - actorBiography="Brad Pitt was born in 1963 in Oklahoma and raised in..." > Document2: > > - movieName="Ocean 11" > > - actorBiography="Brad Pitt was born in 1963 in Oklahoma and raised in..." > > My question: I'm afraid my index files will become very large, due to the > duplication of information. Is there any trick that would keep my index files > in a reasonable size, while still allowing the functionality of "search movie > by actorBiography"? > Thanks :) > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org