Just read through this, and don't yet have any concrete ideas better than
what's been given, but I'm interested to clarify one thing you said:

We are having 6 shards spread across 96 replicas. Each replica is hosted on
> a dedicated EC2 instance, no more than one replica present on the same
> machine
>

Is that implying 6x96 physical machines  ( = 576 pieces of hardware? ) or
are you overlapping replicas for different shards on the same machine (=
576 processes on 96 bits of hardware) ? or overlapping on the same node (96
processes on 96 bits of hardware)?

The last one is much more common. If you've really got 576 java processes
running solr, that's a fair bit of communication that needs to happen as
replicas go up and down.

Have you observed any slowness on zookeeper during these episodes?


On Thu, Jan 19, 2023 at 9:58 AM Houston Putman <hous...@apache.org> wrote:

> >
> > I was wondering, could it be something wrong with the solrconfig.xml
> > parameters? Perhaps, a combination of parameters does not behave stable?
> Do
> > you think it makes sense to go with a vanilla solrconfig.xml and
> introduce
> > all the custom options one-by-one (i.e. ShardHandlerFactory, etc.)?
>
>
> That is a great idea. (Obviously with the operator you need to keep some of
> the values there that it relies on, but I think everything it uses is
> vanilla starting with Solr 9)
>
> - Houston
>
> On Thu, Jan 19, 2023 at 9:43 AM Nick Vladiceanu <vladicean...@gmail.com>
> wrote:
>
> > Thanks Kevin for looking into it.
> >
> > I’ll answer the questions in the original order:
> > * Pod volume has the correct permissions. Basically, we use emptyDir
> > provisioned by the solr-operator. All the nodes are having exactly the
> same
> > setup. No pods are co-located on the same worker node. No more than one
> > Solr core is located on the same node.
> > * We are actively indexing and querying. We do also use partial updates.
> > Since we use TLOG replica types, we have a hard commit of 180s that
> opens a
> > new searcher.
> > * JDK 11 and JDK 17 behaves the same way. We were able to reproduce on
> > both builds.
> >
> > As per directory exceptions, I also cannot understand why it is throwing
> > that Unknown Directory exception. I have logged in into a Solr pod that
> was
> > throwing this error and was able to find the exact location on the disk
> > existing.
> >
> > When reload fails, sometimes it might fail on one node, other times fail
> > on multiple nodes at the same time. I was checking all the logs on the
> k8s
> > node and on the pod but couldn’t find anything related to the disk,
> > network, or other errors.
> >
> > I was wondering, could it be something wrong with the solrconfig.xml
> > parameters? Perhaps, a combination of parameters does not behave stable?
> Do
> > you think it makes sense to go with a vanilla solrconfig.xml and
> introduce
> > all the custom options one-by-one (i.e. ShardHandlerFactory, etc.)?
> >
> > ---
> > Nick Vladiceanu
> > vladicean...@gmail.com
> >
> >
> >
> >
> > > On 18. Jan 2023, at 18:41, Kevin Risden <kris...@apache.org> wrote:
> > >
> > > So I am going to share some ideas just in case it triggers something -
> I
> > > have this gut feel that the cores are closing due to an exception of
> some
> > > kind. It seems like a lot of the issue is either index corruption or
> > > "SolrCoreState already closed."
> > >
> > > * Does the pod volume have the correct permissions for Solr to
> > read/write?
> > > * Are you indexing these nodes or just querying? (asking this if these
> > are
> > > meant to be read only that would be different than a changing index)
> > > * Have you taken into account
> > > https://issues.apache.org/jira/browse/SOLR-16463 by chance if you
> have a
> > > custom Docker image? (this might not be necessary since you say it
> > > reproduces on JDK 11)
> > >
> > > I found this part of your update the most intriguing. Why would
> changing
> > > the directory factory change this? My understanding is everything under
> > > "/var/solr/data/my_collection_shard3_replica_t1643" should be
> controlled
> > by
> > > Solr both read/write so any directories underneath would be created
> > > automatically.
> > >
> > > directoryFactory:
> > >> https://solr.apache.org/docs/9_1_0/core/org/apache/solr/core
> > >> /MMapDirectoryFactory.html throwing the following exception:
> > >> o.a.s.c.SolrCore java.lang.IllegalArgumentException: Unknown
> directory:
> > >> MMapDirectory@
> > /var/solr/data/my_collection_shard3_replica_t1643/data/snapshot_metadata
> > >> (we do not use snapshots at all) (stack trace
> > https://justpaste.it/88en6)
> > >> Switched to
> > https://solr.apache.org/docs/9_1_0/core/org/apache/solr/core
> > >> /StandardDirectoryFactory.html; problem solved, no more Unknown
> > directory
> > >> exceptions
> > >> Reload won’t fail on some nodes with Unknown directory exception;
> > >> Result: reload still timing out, fewer exceptions;
> > >
> > >
> > > My guess is that the reload is going to some node and that one node is
> > > causing the whole process to timeout. If you find that node then you
> > should
> > > be able to collect the logs and see. Basically there is some reason the
> > > cores are closing and its not good. I would guess the collection reload
> > > timing out is just a symptom of whatever the bigger underlying cause
> is.
> > >
> > > Kevin Risden
> > >
> > >
> > > On Wed, Dec 21, 2022 at 5:57 AM Nick Vladiceanu <
> vladicean...@gmail.com>
> > > wrote:
> > >
> > >> yes, it’s very unusual. On Solr 8.11 (and previous versions) with
> setup
> > >> and size of data, reload takes just a few seconds.
> > >>
> > >> We are having 6 shards spread across 96 replicas. Each replica is
> hosted
> > >> on a dedicated EC2 instance, no more than one replica present on the
> > same
> > >> machine (in k8s words, it's one pod per node, one solr replica per
> pod).
> > >>
> > >> I am able to reproduce the reload issue on Solr 9.0 and 9.1. Tried to
> > >> isolate the underlying node along with the Solr Pod, couldn’t identify
> > and
> > >> issues like high load, iowait, whatsoever. Only issues I see are
> > exceptions
> > >> in the Solr logs that never recover unless Pods are restarted (we use
> > empty
> > >> dir and not persistent value, every time a pod is restarted, the cores
> > that
> > >> were hosted on it are removed and when the pod comes back it gets
> > allocated
> > >> to the same shard or another shard, depending on how many replicas
> other
> > >> shards have, we try to keep balance in # of replicas)
> > >>
> > >> I agree that 23GB of heap is a bit too much and are doing some work to
> > >> optimize it (resizing caches, etc.). We tried to lower the heap to
> 20GB
> > >> already and GC performance is better, and in general Solr performs
> > better.
> > >> I must mention that we have the same heap size in Solr 8.11 and it
> > doesn’t
> > >> cause any issues with the reload. Could it have an impact on Solr 9,
> > >> somehow?
> > >>
> > >> Thank you a lot for sharing your thoughts, especially for explaining
> GC
> > >> params and sharing yours, very much appreciated.
> > >>
> > >> Do you have any ideas on what we should try more? Here is the digest
> of
> > >> what we have tried, but without any success:
> > >>
> > >> Zookeeper: Upgrade from 3.6 to 3.7 and 3.8: no impact;
> > >>
> > >> DNS: Solr pods joining and communicating over Pod IP instead of Pod
> Svc
> > >> DNS name (headless). This was done in order to avoid any potential
> > issues
> > >> (even though CoreDNS/nodelocaldns metrics looked Ok) with DNS
> > resolvers; no
> > >> impact;
> > >>
> > >> Lucene: upgrade to Lucene 9.1.0, 9.2.0, 9.3.0; no impact;
> > >>
> > >> Solr Nodes:
> > >> version:
> > >> tried Solr 9.0.0 and Solr 9.1.0
> > >> Result: no difference;
> > >> Heap:
> > >> recalculate the Heap size;
> > >> reduce the size by 3GB (15%) in combination with caches resize (see
> > below);
> > >> Result: better performance, no old GC is triggered; cluster more
> stable;
> > >> reload still timing out;
> > >> TLOG and PULL:
> > >> tested with 3TLOG replicas per shard, the rest 12 replicas PULL;
> > >> tested all 15 replicas per shard of type TLOG;
> > >> NRT is not an option at all, didn’t even try to test;
> > >> Result: better response time with PULL, no impact on reload;
> > >> Other tunings, including gc; no impact;
> > >>
> > >> solrconfig.xml:
> > >> directoryFactory:
> > >>
> > >>
> >
> https://solr.apache.org/docs/9_1_0/core/org/apache/solr/core/MMapDirectoryFactory.html
> > >> throwing the following exception:  o.a.s.c.SolrCore
> > >> java.lang.IllegalArgumentException: Unknown directory: MMapDirectory@
> > /var/solr/data/my_collection_shard3_replica_t1643/data/snapshot_metadata
> > >> (we do not use snapshots at all) (stack trace
> > https://justpaste.it/88en6)
> > >> Switched to
> > >>
> >
> https://solr.apache.org/docs/9_1_0/core/org/apache/solr/core/StandardDirectoryFactory.html
> > ;
> > >> problem solved, no more Unknown directory exceptions
> > >> Reload won’t fail on some nodes with Unknown directory exception;
> > >> Result: reload still timing out, fewer exceptions;
> > >> lockType:
> > >> Switched between “native” and “simple” lock type;
> > >> Result: no impact;
> > >> HttpShardHandlerFactory
> > >> Increased timeout by 40% for cross-shards communication while doing
> > >> queries;
> > >> Result: no impact;
> > >> filterCache,queryResultCache and documentCache:
> > >> Limit the size of the caches to Megabyte instead of entries:
> > >> filterCache - 1024MB
> > >> queryResultCache - 1024MB
> > >> documentCache - 2048MB
> > >> Result: nodes are more stable during reload, cluster is not
> > destabilizing,
> > >> no old GC activity; better response time, less pressure on GC;
> > >> circuitBreaker:
> > >> disabling circuitBreaker;
> > >> Result: no impact;
> > >>
> > >> ---
> > >> Nick Vladiceanu
> > >> vladicean...@gmail.com
> > >>
> > >>
> > >>
> > >>
> > >>> On 20. Dec 2022, at 15:58, Shawn Heisey <apa...@elyograg.org> wrote:
> > >>>
> > >>> On 12/20/22 06:34, Nick Vladiceanu wrote:
> > >>>> Thank you Shawn for sharing, indeed useful information.
> > >>>> However, I must say that we only used deleteById and never
> > >> deleteByQuery. We also only rely on the auto segment merging and not
> > >> issuing optimize command.
> > >>>
> > >>> That is very unusual.  I've never seen a core reload take more than a
> > >> few seconds, even when I was dealing with core sizes of double-digit
> GB.
> > >> Unless you have hundreds or thousands of replicas for each of your 6
> > >> shards, it really should complete very quickly.
> > >>>
> > >>> Have you been able to determine which Solr cores in the collection
> are
> > >> causing the delay, and take a look at those machines?
> > >>>
> > >>> Some thoughts:
> > >>>
> > >>> When you said 96 nodes, were you talking about Solr instances or
> > >> servers?  You really should only run one Solr instance per server,
> > >> especially for a small index like this.
> > >>>
> > >>> A 23GB heap seems very excessive for a 4.7GB index that has less
> than 4
> > >> million documents.  I'm sure you can reduce that by a lot and
> encounter
> > >> smaller GC pauses as a result.  If you can share your GC logs, I
> should
> > be
> > >> able to provide a recommendation.
> > >>>
> > >>> I've been looking at what MinHeapFreeRatio and MaxHeapFreeRatio do.
> > >> Those settings are probably unnecessary.  This is what I currently use
> > for
> > >> GC tuning on JDK 11 or JDK 17.  This produces EXTREMELY short
> collection
> > >> pauses, but I have noticed that throughput-heavy things like indexing
> > run a
> > >> bit slower, but if the indexing is multi-threaded, I think that it
> would
> > >> not be affected a lot.
> > >>>
> > >>> GC_TUNE=" \
> > >>> -XX:+UnlockExperimentalVMOptions \
> > >>> -XX:+UseZGC \
> > >>> -XX:+ParallelRefProcEnabled \
> > >>> -XX:+ExplicitGCInvokesConcurrent \
> > >>> -XX:+UseStringDeduplication \
> > >>> -XX:+AlwaysPreTouch \
> > >>> -XX:+UseNUMA \
> > >>> "
> > >>>
> > >>> ZGC has one unexpected disadvantage.  Using it will disable
> Compressed
> > >> OOPs -- meaning that even with a heap smaller than 32GB, it uses 64
> bit
> > >> pointers.  This hasn't really impacted me ... the index is so small
> that
> > >> with a 1GB heap I have more than enough.  If low pauses are the most
> > >> important thing you need from GC and you're running at least JDK11, I
> > would
> > >> strongly recommend ZGC.  It does make indexing slower for me -- a full
> > >> rebuild that takes 10 minutes with G1 takes 11 minutes with ZGC. But
> > even
> > >> the worst-case GC pauses are single-digit milliseconds.
> > >>>
> > >>> For G1GC, which is still the best option for JDK8, this is what I
> used
> > >> to have:
> > >>>
> > >>> #GC_TUNE=" \
> > >>> #  -XX:+UseG1GC \
> > >>> #  -XX:+ParallelRefProcEnabled \
> > >>> #  -XX:MaxGCPauseMillis=100 \
> > >>> #  -XX:+ExplicitGCInvokesConcurrent \
> > >>> #  -XX:+UseStringDeduplication \
> > >>> #  -XX:+AlwaysPreTouch \
> > >>> #  -XX:+UseNUMA \
> > >>> #"
> > >>>
> > >>> Thanks,
> > >>> Shawn
> > >>
> > >>
> >
> >
>


-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)

Reply via email to