Re: recommended index size

Anh Dũng Bùi Thu, 15 May 2025 01:52:20 -0700

I'm already too late to the party but +1 on Mike McCandleless comment on
RAM. There are a number of efforts in the past to move to off heap. FWIW
you can use Lucene FST (a trie-based datastructure used in many different
places including term index, synonym dictionary, etc) to build a large
(>>32GB) with just a small amount of memory (few to tens MB, depending on
how critical we want to create a minimal FST). But it has not been
incorporated into Lucene default index codec yet. The Lucene FST term index
usually doesn't require much memory though, as it would split into smaller
segments that operate independently and the FST only contains prefixes of
terms, instead of the actual terms.


One thing is that I think the vector index building is not off heap yet,
but possible? (Or would we be much better off if using an on disk
datastructure directly?)

But eventually during searching, when using off heap we are relying on the
OS memory mapping to load hot/frequently accessed pages to memory so having
enough memory is still critical. That depends on the access patterns (how
much frequently you are accessing some specific portions of the disk) that
needs to be tuned per system.

On Fri, Jan 5, 2024 at 3:51 Ralf Heyde <ralf.he...@gmx.de.invalid> wrote:

> Hi Vincent,
>
> My 2 cents:
>
> We had a production environment with ~250g and ~1M docs with static +
> dynamic fields in Solr  (afair lucene 7) with a machine having 4GB for the
> jvm and (afair) a little bit more maybe 6GB OS ‚cache‘.
> In peak times (re-index) we had 10-15k updates / minute and (partially)
> complex queries up to 50/sec per jvm. At those times our servers still had
> rotating discs.
>
> In this setup we did not experience any performance issues when we did not
> had bugs / misconfigurations.
>
> We were thinking of sharding / splitting indexes, but did not do it due to
> complexity of maintaining those later - AND especially - there was NO NEED
> at all.
>
> Elasticsearch/Solr started to do it out of the box at that time. Maybe
> Kibana/ELK or such is a thing to look at too.
>
> Cheers from Berlin, Ralf
>
> Von meinem Telefon gesendet, etwaige Rechtschreibfehler kann ich nicht
> ausschliessen
>
> > Am 04.01.2024 um 17:32 schrieb Michael McCandless <
> luc...@mikemccandless.com>:
> >
> > Hi Vincent,
> >
> > Lucene has a hard limit of ~2.1 B documents in a single index; hopefully
> > you hit the ~50 - 100 GB limit well before that.
> >
> > Otherwise it's very application dependent: how much latency can you
> > tolerate during searching, how fast are the underlying IO devices at
> random
> > and large sequential IO, the types of queries, etc.
> >
> > Lucene should not require much additional RAM as the index gets larger --
> > much work has been done in recent years to move data structures off-heap.
> >
> > Mike McCandless
> >
> > http://blog.mikemccandless.com
> >
> >
> >> On Tue, Jan 2, 2024 at 9:49 AM <vvse...@gmail.com> wrote:
> >>
> >> Hello,
> >>
> >> is there a recommended / rule of thumb maximum size for index?
> >> I try to target between 50 and 100 Gb, before spreading to other
> servers.
> >> or is this just a matter of how much memory and cpu I have?
> >> this is a log aggregation use case. a lot of write, smaller number of
> reads
> >> obviously.
> >> I am using lucene 9.
> >> thanks,
> >> Vincent
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Re: recommended index size

Reply via email to