Re: "shared fields"?

Erick Erickson Wed, 09 Mar 2011 06:49:02 -0800

How large is (large)? What machines are you intending to run this on?

In general, though, don't worry about index size until you actually have some
numbers to deal with. Solr generally has resource issues based on the number
of #unique# terms in an index. So repeating the same thing in a bunch of
documents isn't as bad as you'd suppose.


If you *store* fields, certain files in your index will grow linearly,
but these aren't
the ones that are used for searching. *.fdx, *.fdt, *.fnm in
particular will grow.

So I'd go ahead and just replicate the data and then monitor your system for, in
particular, cache issues (see the admin/stats page). Normalizing your data
is tricky in Solr, so don't do it unless it proves necessary IMO...

Best
Erick

On Wed, Mar 9, 2011 at 9:38 AM, sol myr <solmy...@yahoo.com> wrote:
> Hi,
>
> I have several documents that share the same (large) searchable data.
> For example, say my Documents represent movies, and  2 movies share the same 
> actorBiography of Brad Pitt (assuming I want
> to search movies by actorBiography words, far-fetched as it might seem):
>
>
> Document1:
> - movieName="Benjamin Button"
> - actorBiography="Brad Pitt was born in 1963 in Oklahoma and raised in..."
> Document2:
>
> - movieName="Ocean 11"
>
> - actorBiography="Brad Pitt was born in 1963 in Oklahoma and raised in..."
>
> My question: I'm afraid my index files will become very large, due to the 
> duplication of information. Is there any trick that would keep my index files 
> in a reasonable size, while still allowing the functionality of "search movie 
> by actorBiography"?
> Thanks :)
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: "shared fields"?

Reply via email to