Soft commit takes 5 seconds in Solr 8.9.0

2023-09-28 Thread John Jackson
Hello

We are using Solr 8.9.0. We have configured Solr cloud like 2 shards and
each shard has one replica. We have used 5 zoo keepers for Solr cloud.

We have used the below schema field in employee collection.  



*Total no of record*: 8562099
*Size of instance:* solrgnrls2r167GB solrgnrls166 GB
solrgnrls1r166 GB solrgnrls268 GB

*Solr logs:* 2023-09-14 10:04:30.705 DEBUG (qtp1984975621-8805766) [c:forms
s:shard1 r:core_node3 x:forms_shard1_replica_n1]
o.a.s.u.DirectUpdateHandler2
updateDocuments(add{_version_=1777003156686766080,id=EMP5487098118986160})
2023-09-14 10:04:30.710 INFO  (qtp1984975621-8805766) [c:forms s:shard1
r:core_node3 x:forms_shard1_replica_n1] o.a.s.u.p.LogUpdateProcessorFactory
[forms_shard1_replica_n1]  webapp=/solr path=/update
params={wt=javabin&version=2}{add=[FORM5487098118986160
(1777003156686766080)]} 0 5 2023-09-14 10:04:30.807 DEBUG
(commitScheduler-930-thread-1) [c:employee s:shard1 r:core_node3
x:employee_shard1_replica_n1] o.a.s.u.DirectUpdateHandler2 start
commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
2023-09-14 10:04:35.134 DEBUG (commitScheduler-930-thread-1) [c:employee
s:shard1 r:core_node3 x:employee_shard1_replica_n1]
o.a.s.s.SolrIndexSearcher Opening
[Searcher@796ab9b9[employee_shard1_replica_n1]
main] 2023-09-14 10:04:35.413 DEBUG (commitScheduler-930-thread-1)
[c:employee s:shard1 r:core_node3 x:employee_shard1_replica_n1]
o.a.s.u.DirectUpdateHandler2 end_commit_flush


Why is the commitScheduler thread taking 5 seconds to complete? Due to
this, we can not see the latest update for id EMP5487098118986160. We also
have another collection with an index size of 120 GB and the number of
documents is 744620373 but still, there is no slowness in the soft commit.

When we checked Solr source code we found that time spent in
ExitableDirectoryReader.wrap(UninvertingReader.wrap(reader,
core.getLatestSchema().getUninversionMapper()),SolrQueryTimeoutImpl.getInstance());
this.leafReader = SlowCompositeReaderWrapper.wrap(this.reader);

How can we troubleshoot the issue?


Re: Soft commit takes 5 seconds in Solr 8.9.0

2023-09-28 Thread Jan Høydahl
How many docs have you added before the softCommit?
Do you use any cache warming or other commit hooks?

Jan

> 28. sep. 2023 kl. 13:28 skrev John Jackson :
> 
> Hello
> 
> We are using Solr 8.9.0. We have configured Solr cloud like 2 shards and
> each shard has one replica. We have used 5 zoo keepers for Solr cloud.
> 
> We have used the below schema field in employee collection.  name="id" type="string" indexed="true" stored="true" required="true"
> multiValued="false" docValues="true"/>  indexed="true" stored="true" multiValued="true"/>
> 
> 
> 
> *Total no of record*: 8562099
> *Size of instance:* solrgnrls2r167GB solrgnrls166 GB
> solrgnrls1r166 GB solrgnrls268 GB
> 
> *Solr logs:* 2023-09-14 10:04:30.705 DEBUG (qtp1984975621-8805766) [c:forms
> s:shard1 r:core_node3 x:forms_shard1_replica_n1]
> o.a.s.u.DirectUpdateHandler2
> updateDocuments(add{_version_=1777003156686766080,id=EMP5487098118986160})
> 2023-09-14 10:04:30.710 INFO  (qtp1984975621-8805766) [c:forms s:shard1
> r:core_node3 x:forms_shard1_replica_n1] o.a.s.u.p.LogUpdateProcessorFactory
> [forms_shard1_replica_n1]  webapp=/solr path=/update
> params={wt=javabin&version=2}{add=[FORM5487098118986160
> (1777003156686766080)]} 0 5 2023-09-14 10:04:30.807 DEBUG
> (commitScheduler-930-thread-1) [c:employee s:shard1 r:core_node3
> x:employee_shard1_replica_n1] o.a.s.u.DirectUpdateHandler2 start
> commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
> 2023-09-14 10:04:35.134 DEBUG (commitScheduler-930-thread-1) [c:employee
> s:shard1 r:core_node3 x:employee_shard1_replica_n1]
> o.a.s.s.SolrIndexSearcher Opening
> [Searcher@796ab9b9[employee_shard1_replica_n1]
> main] 2023-09-14 10:04:35.413 DEBUG (commitScheduler-930-thread-1)
> [c:employee s:shard1 r:core_node3 x:employee_shard1_replica_n1]
> o.a.s.u.DirectUpdateHandler2 end_commit_flush
> 
> 
> Why is the commitScheduler thread taking 5 seconds to complete? Due to
> this, we can not see the latest update for id EMP5487098118986160. We also
> have another collection with an index size of 120 GB and the number of
> documents is 744620373 but still, there is no slowness in the soft commit.
> 
> When we checked Solr source code we found that time spent in
> ExitableDirectoryReader.wrap(UninvertingReader.wrap(reader,
> core.getLatestSchema().getUninversionMapper()),SolrQueryTimeoutImpl.getInstance());
> this.leafReader = SlowCompositeReaderWrapper.wrap(this.reader);
> 
> How can we troubleshoot the issue?



Re: Soft commit takes 5 seconds in Solr 8.9.0

2023-09-28 Thread John Jackson
How many docs have you added before the softCommit?

>> only one record EMP5487098118986160 added.

Do you use any cache warming or other commit hooks?

>> No we are not using any cache and our commit for solr config below


60
2
false


100}


We are indexing via zoo keeper and do not commit after indexing all because
we have configured in solr config xml.

On Thu, Sep 28, 2023 at 5:15 PM Jan Høydahl  wrote:

> How many docs have you added before the softCommit?
> Do you use any cache warming or other commit hooks?
>
> Jan
>
> > 28. sep. 2023 kl. 13:28 skrev John Jackson :
> >
> > Hello
> >
> > We are using Solr 8.9.0. We have configured Solr cloud like 2 shards and
> > each shard has one replica. We have used 5 zoo keepers for Solr cloud.
> >
> > We have used the below schema field in employee collection.  > name="id" type="string" indexed="true" stored="true" required="true"
> > multiValued="false" docValues="true"/>  type="text"
> > indexed="true" stored="true" multiValued="true"/>
> >
> >
> >
> > *Total no of record*: 8562099
> > *Size of instance:* solrgnrls2r167GB solrgnrls166 GB
> > solrgnrls1r166 GB solrgnrls268 GB
> >
> > *Solr logs:* 2023-09-14 10:04:30.705 DEBUG (qtp1984975621-8805766)
> [c:forms
> > s:shard1 r:core_node3 x:forms_shard1_replica_n1]
> > o.a.s.u.DirectUpdateHandler2
> >
> updateDocuments(add{_version_=1777003156686766080,id=EMP5487098118986160})
> > 2023-09-14 10:04:30.710 INFO  (qtp1984975621-8805766) [c:forms s:shard1
> > r:core_node3 x:forms_shard1_replica_n1]
> o.a.s.u.p.LogUpdateProcessorFactory
> > [forms_shard1_replica_n1]  webapp=/solr path=/update
> > params={wt=javabin&version=2}{add=[FORM5487098118986160
> > (1777003156686766080)]} 0 5 2023-09-14 10:04:30.807 DEBUG
> > (commitScheduler-930-thread-1) [c:employee s:shard1 r:core_node3
> > x:employee_shard1_replica_n1] o.a.s.u.DirectUpdateHandler2 start
> >
> commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
> > 2023-09-14 10:04:35.134 DEBUG (commitScheduler-930-thread-1) [c:employee
> > s:shard1 r:core_node3 x:employee_shard1_replica_n1]
> > o.a.s.s.SolrIndexSearcher Opening
> > [Searcher@796ab9b9[employee_shard1_replica_n1]
> > main] 2023-09-14 10:04:35.413 DEBUG (commitScheduler-930-thread-1)
> > [c:employee s:shard1 r:core_node3 x:employee_shard1_replica_n1]
> > o.a.s.u.DirectUpdateHandler2 end_commit_flush
> >
> >
> > Why is the commitScheduler thread taking 5 seconds to complete? Due to
> > this, we can not see the latest update for id EMP5487098118986160. We
> also
> > have another collection with an index size of 120 GB and the number of
> > documents is 744620373 but still, there is no slowness in the soft
> commit.
> >
> > When we checked Solr source code we found that time spent in
> > ExitableDirectoryReader.wrap(UninvertingReader.wrap(reader,
> >
> core.getLatestSchema().getUninversionMapper()),SolrQueryTimeoutImpl.getInstance());
> > this.leafReader = SlowCompositeReaderWrapper.wrap(this.reader);
> >
> > How can we troubleshoot the issue?
>
>


Re: Soft commit takes 5 seconds in Solr 8.9.0

2023-09-28 Thread Jan Høydahl
> 100}

There seems to be a typo here with the "}"? 
It is unusual with 100ms commit time, you risk that commits pile up during 
rapid indexing and cause inefficiencies. I'd increase it to at least 1000ms.

Can  you reproduce this in an IDLE system by simply adding ONE document?
What does your document look like? Number of fields, size, nested docs etc? 
Does it happen every time or just once in a while?
Do you have access to system metrics for the server and jvm which can tell 
something about its general health and load?

Jan


> 28. sep. 2023 kl. 13:54 skrev John Jackson :
> 
> How many docs have you added before the softCommit?
> 
>>> only one record EMP5487098118986160 added.
> 
> Do you use any cache warming or other commit hooks?
> 
>>> No we are not using any cache and our commit for solr config below
> 
> 
> 60
> 2
> false
> 
> 
> 100}
> 
> 
> We are indexing via zoo keeper and do not commit after indexing all because
> we have configured in solr config xml.
> 
> On Thu, Sep 28, 2023 at 5:15 PM Jan Høydahl  wrote:
> 
>> How many docs have you added before the softCommit?
>> Do you use any cache warming or other commit hooks?
>> 
>> Jan
>> 
>>> 28. sep. 2023 kl. 13:28 skrev John Jackson :
>>> 
>>> Hello
>>> 
>>> We are using Solr 8.9.0. We have configured Solr cloud like 2 shards and
>>> each shard has one replica. We have used 5 zoo keepers for Solr cloud.
>>> 
>>> We have used the below schema field in employee collection. >> name="id" type="string" indexed="true" stored="true" required="true"
>>> multiValued="false" docValues="true"/> > type="text"
>>> indexed="true" stored="true" multiValued="true"/>
>>> 
>>> 
>>> 
>>> *Total no of record*: 8562099
>>> *Size of instance:* solrgnrls2r167GB solrgnrls166 GB
>>> solrgnrls1r166 GB solrgnrls268 GB
>>> 
>>> *Solr logs:* 2023-09-14 10:04:30.705 DEBUG (qtp1984975621-8805766)
>> [c:forms
>>> s:shard1 r:core_node3 x:forms_shard1_replica_n1]
>>> o.a.s.u.DirectUpdateHandler2
>>> 
>> updateDocuments(add{_version_=1777003156686766080,id=EMP5487098118986160})
>>> 2023-09-14 10:04:30.710 INFO  (qtp1984975621-8805766) [c:forms s:shard1
>>> r:core_node3 x:forms_shard1_replica_n1]
>> o.a.s.u.p.LogUpdateProcessorFactory
>>> [forms_shard1_replica_n1]  webapp=/solr path=/update
>>> params={wt=javabin&version=2}{add=[FORM5487098118986160
>>> (1777003156686766080)]} 0 5 2023-09-14 10:04:30.807 DEBUG
>>> (commitScheduler-930-thread-1) [c:employee s:shard1 r:core_node3
>>> x:employee_shard1_replica_n1] o.a.s.u.DirectUpdateHandler2 start
>>> 
>> commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
>>> 2023-09-14 10:04:35.134 DEBUG (commitScheduler-930-thread-1) [c:employee
>>> s:shard1 r:core_node3 x:employee_shard1_replica_n1]
>>> o.a.s.s.SolrIndexSearcher Opening
>>> [Searcher@796ab9b9[employee_shard1_replica_n1]
>>> main] 2023-09-14 10:04:35.413 DEBUG (commitScheduler-930-thread-1)
>>> [c:employee s:shard1 r:core_node3 x:employee_shard1_replica_n1]
>>> o.a.s.u.DirectUpdateHandler2 end_commit_flush
>>> 
>>> 
>>> Why is the commitScheduler thread taking 5 seconds to complete? Due to
>>> this, we can not see the latest update for id EMP5487098118986160. We
>> also
>>> have another collection with an index size of 120 GB and the number of
>>> documents is 744620373 but still, there is no slowness in the soft
>> commit.
>>> 
>>> When we checked Solr source code we found that time spent in
>>> ExitableDirectoryReader.wrap(UninvertingReader.wrap(reader,
>>> 
>> core.getLatestSchema().getUninversionMapper()),SolrQueryTimeoutImpl.getInstance());
>>> this.leafReader = SlowCompositeReaderWrapper.wrap(this.reader);
>>> 
>>> How can we troubleshoot the issue?
>> 
>> 



RE: Cancelling an Async operation - Shard split

2023-09-28 Thread Hitendra Talluri
Dear Community

Is there a way to cancel a shard split operation in solr? I couldn’t find any 
such option in collection/core management API. I see the operation is tracked 
via zookeeper nodes,will I be able to cancel the operation by clearing these 
nodes from ZK?

Regards
Hitendra


Re: Backup from old server and Restore to new server

2023-09-28 Thread Gus Heck
Scanned this thread, apologies if I missed something, but here's a few
thoughts:

To get better advice make it clear if you are running Solr in Cloud mode
(a.k.a. self managed) or Legacy (a.k.a user managed). Some ways to know
which quickly:

   1. Is there an associated Zookeeper cluster? If yes, then you are in
   cloud mode if not then *probably* legacy (there's a way to run zookeeper
   embedded, but that's not the normal setup).
   2. In the admin UI do you see the word 'Cloud' in the left navigation
   bar? If yes, cloud, if no, legacy

*Key concept: Solr is (normally) just a server providing access to an index
of your data. It allows you to find a link, or id for a "document" but does
not (normally) serve as a repository for your data.*

This has some implications:

   1. Solr is typically paired with one or more data repositories
   (database, file system, sharepoint, etc)
   2. Solr normally cannot reindex data all by itself. Re-indexing is the
   process of re-reading the repository, and creating a fresh index.
   3. Solr is just an index, and does not manage the process of reading the
   data from sources (Exceptions like Data import handler[DIH] and streaming
   expressions exist, but DIH went away in 9.x and these are exceptions not
   the rule)
   4. Typically *something* outside of solr sends documents to solr.
   Re-indexing is normally the process of re-triggering something to send the
   documents again.
   5. This is unlike a database which contains both the data (the table)
   and an index (PK/FK/index) of the data.
   6. Versus a database, Solr's benefit is that it is an index of the
   *words* in the text of the document rather than entire string values.

Thus (exceptional cases excluded) things you do to or in solr don't
"trigger reindexing".

I have implied that sometimes solr can be the store for your data, which is
technically true. Unfortunately, this is tricky to get right, may
negatively impact performance, and results in long term data loss if done
wrong, so it's rarely recommended. I hope you haven't inherited this type
of problem!

Upgrading Solr across a single minor version is often simple, but
occasionally requires work. Always read release notes and test the result
before going live. Upgrading across major versions is always work. Lucene
(and therefore solr) requires that you reindex data with each major
version. There are stopgap tools to allow an upgrade of an existing index,
but that is a temporary measure that only works for N to N+1 and you are
expected to re-index before N+2.

- Gus

-- 
http://www.needhamsoftware.com (work)
https://a.co/d/b2sZLD9 (my fantasy fiction book)


Re: Cancelling an Async operation - Shard split

2023-09-28 Thread Gus Heck
Unless you are very experienced and comfortable with solr, do not edit
zookeeper nodes directly. Things you should touch generally have support in
bin/solr or other provided tools. If you edit the wrong things you can
cause all manner of chaos, and even completely ruin the entire cluster,
requiring everything to be rebuilt from scratch.

I've not tried to stop a shard split before so I don't know if there's a
good way to do that, but don't experiment with zookeeper (unless it's a
test system you don't care about)

-Gus

On Thu, Sep 28, 2023 at 11:00 AM Hitendra Talluri
 wrote:

> Dear Community
>
> Is there a way to cancel a shard split operation in solr? I couldn’t find
> any such option in collection/core management API. I see the operation is
> tracked via zookeeper nodes,will I be able to cancel the operation by
> clearing these nodes from ZK?
>
> Regards
> Hitendra
>


-- 
http://www.needhamsoftware.com (work)
https://a.co/d/b2sZLD9 (my fantasy fiction book)