Optimal Sharding Strategy for Solr Cloud v8.10

2023-09-13 Thread Saksham Gupta
Hi All,

I have been trying to reduce the response time of solr cloud(v8.10, 8
nodes). To achieve this, I have tried increasing the number of shards of
solr cloud which can help reduce data size on each shard thereby reducing
response time.


I have encountered a few questions regarding sharding strategy:

1. How to decide the ideal number of shards? Is there a minimum or maximum
number of shards which should be used?

2. What is the minimum size of a shard after which reducing the size
further won't have any effect on the response time (as time taken by other
factors like data aggregation will compensate for that) ?

3. Is there some maximum limit to the size of data that should be kept in a
shard?


As of now we have 8 shards each on a separate node with ~25 gb of
data(15-16 million docs) present on each shard. Please advise me of the
standard approaches to define the number of shards and shard size. Thanks
in advance.


Re: Optimal Sharding Strategy for Solr Cloud v8.10

2023-09-13 Thread ufuk yılmaz
My two cents:

1) Try to have shards small enough to fit in memory so entire index can be 
cached, less disk access=more speed. So #of shards depend on the memory 
available on your nodes. If you still need more read throughput add more 
replicas

2) At some point network chatter would be too much due to the #of shards

3) 2 billion documents per shard (if this didn’t change in recent versions)

-ufuk


—

> On 13 Sep 2023, at 11:32, Saksham Gupta  
> wrote:
> 
> Hi All,
> 
> I have been trying to reduce the response time of solr cloud(v8.10, 8
> nodes). To achieve this, I have tried increasing the number of shards of
> solr cloud which can help reduce data size on each shard thereby reducing
> response time.
> 
> 
> I have encountered a few questions regarding sharding strategy:
> 
> 1. How to decide the ideal number of shards? Is there a minimum or maximum
> number of shards which should be used?
> 
> 2. What is the minimum size of a shard after which reducing the size
> further won't have any effect on the response time (as time taken by other
> factors like data aggregation will compensate for that) ?
> 
> 3. Is there some maximum limit to the size of data that should be kept in a
> shard?
> 
> 
> As of now we have 8 shards each on a separate node with ~25 gb of
> data(15-16 million docs) present on each shard. Please advise me of the
> standard approaches to define the number of shards and shard size. Thanks
> in advance.



Re: Solr latest 9.x - SOLR_HOST

2023-09-13 Thread Ishan Chattopadhyaya
If you're sure it is a regression, please open a JIRA ticket for it and
someone can take a look. Alternatively, please feel free to submit a PR too.

Thanks!
Ishan

On Wed, 13 Sept, 2023, 8:00 am Natarajan, Rajeswari,
 wrote:

> Additional details
>
> Running  Solr  instance in alpine OS
> Not running solr as service
> And it is working with solr 8.11.1 version
>
> Thanks,
> Rajeswari
> On 9/12/23, 10:17 PM, "Natarajan, Rajeswari" 
>  LID> wrote:
>
>
> Trying to upgrade to solr 9.3 .
>
>
> Defined below in solr.in.sh
> SOLR_HOST=
> SOLR_JETTY_HOST=0.0.0.0
>
>
>
>
> But still see the solr instance getting registered as localhost with
> zookeeper . Not sure what is missing?
>
>
> 2023-09-13 02:11:10.514 INFO (main) [] o.a.s.c.ZkController Register node
> as live in ZooKeeper:/live_nodes/localhost:8983_solr
>
>
>
>
> Thanks,
> Rajeswari
>
>
>
>
> On 5/21/22, 11:06 AM, "Shawn Heisey"  apa...@elyograg.org> >> wrote:
>
>
>
>
>
>
>
>
> On 5/21/2022 8:02 AM, Shawn Heisey wrote:
> > If you have installed the solr service, then you would want to do
> > "service solr status" instead, replacing solr with whatever you
> > actually named the service. Did you install the service with the
> > service installer script? What options did you use for that? What OS
> > is this on? What is the full path of the configuration file where you
> > changed SOLR_HOST? If you haven't used the installer script, then I
> > will need details about exactly how and where you installed Solr in
> > order to know what to ask next.
>
>
>
>
> This might be really easy to resolve.
>
>
>
>
> I've been looking at the startup scripts. SOLR_HOST does not control
> what address Solr listens on. Solr 9.x only listens on 127.0.0.1 by
> default. Previous Solr versions listened on all interfaces by default.
>
>
>
>
> I bet if you added this line it might start working. If not, provide
> the info already requested:
>
>
>
>
> SOLR_JETTY_HOST=192.168.100.2
>
>
>
>
> You should probably also keep the SOLR_HOST you have defined.
>
>
>
>
> Thanks,
> Shawn
>
>
>
>
>
>
>
>
>
>
>
>
>
>


Re: Restart on a node triggers restart like impact on all the other nodes in cluster

2023-09-13 Thread Mikhail Khludnev
Hello, Rajani
Just a blind guess, it may recover dropped replicas to remaining nodes.
Probably you need to request
https://solr.apache.org/guide/solr/latest/deployment-guide/cluster-node-management.html#migratereplicas
to remove replicas out of recycling node beforehand. WDYT?

On Wed, Sep 13, 2023 at 3:29 AM rajani m  wrote:

> Hi Solr Users,
>
>   Solr 9.1.1 version, upon restarting solr on any node in the cluster, a
> unique event is triggered across all the *other* nodes in the cluster that
> has an impact similar to restarting solr on all the other nodes in the
> cluster. There is dip in the cpu usage, all the caches are emptied and
> warmed up, there are disk reads/writes on all the other nodes.
>
>  The nodes in the cluster are usually at 40% cpu usage and 80% memory, they
> receive certain requests and updates, and at any time solr on a node is
> restarted, the shards on that nodes take 1-2 minutes to recover and by the
> time it is recovered, the other nodes receive an event that cause restart
> like impact on them, seeing caches cleanup, cpu dip and so on. What can
> cause this type of behavior?
>
>  Thank you,
> Rajani
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Optimal Sharding Strategy for Solr Cloud v8.10

2023-09-13 Thread Jan Høydahl
Hi,

There are no hard rules wrt sharding, it often comes down to measuring and 
experimenting for your workload.

There are other things to consider than shard size. Why are the queries slow? 
How many rows do you ask for? Do you use faceting? Grouping?
You have 25Gb of data on each of the 8 nodes/shards. Now, how much RAM does 
each node have, and how much RAM did you allocate to Solr/Java?
A common mistake is to allocate too much ram/heap to Solr to you don't get any 
virtual memory caching in Linux.
Say you have 32Gb of physical RAM on the nodes. Then do not give 30 of those to 
Solr. Instead give 8Gb to Solr and let 24Gb be available for disk caching.

Other things to consider is to look at whether your queries can be optimized by 
rewriting them to more efficient equivalents. Sometimes, Solr-level caches can 
also help.

Wrt shards efficiency: If you already have 8 shards, it is not much more 
expensive to go to 16, but you increase the risk of a single failure affecting 
your requests...

Jan

> 13. sep. 2023 kl. 10:32 skrev Saksham Gupta 
> :
> 
> Hi All,
> 
> I have been trying to reduce the response time of solr cloud(v8.10, 8
> nodes). To achieve this, I have tried increasing the number of shards of
> solr cloud which can help reduce data size on each shard thereby reducing
> response time.
> 
> 
> I have encountered a few questions regarding sharding strategy:
> 
> 1. How to decide the ideal number of shards? Is there a minimum or maximum
> number of shards which should be used?
> 
> 2. What is the minimum size of a shard after which reducing the size
> further won't have any effect on the response time (as time taken by other
> factors like data aggregation will compensate for that) ?
> 
> 3. Is there some maximum limit to the size of data that should be kept in a
> shard?
> 
> 
> As of now we have 8 shards each on a separate node with ~25 gb of
> data(15-16 million docs) present on each shard. Please advise me of the
> standard approaches to define the number of shards and shard size. Thanks
> in advance.



Re: Solr latest 9.x - SOLR_HOST

2023-09-13 Thread Jan Høydahl
Are you modifying the correct solr.in.sh file?
How do you install solr?

Can you by any chance reproduce this in a new, tiny, fresh 9.3 cluster, to rule 
out old cruft remaining from an existing install?

Jan

> 13. sep. 2023 kl. 04:29 skrev Natarajan, Rajeswari 
> :
> 
> Additional details
> 
> Running  Solr  instance in alpine OS
> Not running solr as service
> And it is working with solr 8.11.1 version
> 
> Thanks,
> Rajeswari
> On 9/12/23, 10:17 PM, "Natarajan, Rajeswari" 
>  LID> wrote:
> 
> 
> Trying to upgrade to solr 9.3 .
> 
> 
> Defined below in solr.in.sh
> SOLR_HOST=
> SOLR_JETTY_HOST=0.0.0.0
> 
> 
> 
> 
> But still see the solr instance getting registered as localhost with 
> zookeeper . Not sure what is missing?
> 
> 
> 2023-09-13 02:11:10.514 INFO (main) [] o.a.s.c.ZkController Register node as 
> live in ZooKeeper:/live_nodes/localhost:8983_solr
> 
> 
> 
> 
> Thanks,
> Rajeswari
> 
> 
> 
> 
> On 5/21/22, 11:06 AM, "Shawn Heisey"    >> wrote:
> 
> 
> 
> 
> 
> 
> 
> 
> On 5/21/2022 8:02 AM, Shawn Heisey wrote:
>> If you have installed the solr service, then you would want to do 
>> "service solr status" instead, replacing solr with whatever you 
>> actually named the service. Did you install the service with the 
>> service installer script? What options did you use for that? What OS 
>> is this on? What is the full path of the configuration file where you 
>> changed SOLR_HOST? If you haven't used the installer script, then I 
>> will need details about exactly how and where you installed Solr in 
>> order to know what to ask next.
> 
> 
> 
> 
> This might be really easy to resolve.
> 
> 
> 
> 
> I've been looking at the startup scripts. SOLR_HOST does not control 
> what address Solr listens on. Solr 9.x only listens on 127.0.0.1 by 
> default. Previous Solr versions listened on all interfaces by default.
> 
> 
> 
> 
> I bet if you added this line it might start working. If not, provide 
> the info already requested:
> 
> 
> 
> 
> SOLR_JETTY_HOST=192.168.100.2
> 
> 
> 
> 
> You should probably also keep the SOLR_HOST you have defined.
> 
> 
> 
> 
> Thanks,
> Shawn
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 



Re: Restart on a node triggers restart like impact on all the other nodes in cluster

2023-09-13 Thread Shawn Heisey

On 9/12/23 18:28, rajani m wrote:

   Solr 9.1.1 version, upon restarting solr on any node in the cluster, a
unique event is triggered across all the *other* nodes in the cluster that
has an impact similar to restarting solr on all the other nodes in the
cluster. There is dip in the cpu usage, all the caches are emptied and
warmed up, there are disk reads/writes on all the other nodes.


How much RAM is in each node?  How much is given to the Java heap?  Are 
you running more than one Solr instance on each node?  How much disk 
space do the indexes on each node consume?


What are the counts of:

* Nodes
* Collections
* Shards per collection
* Replica count per shard
* Documents per shard

There is sometimes some confusion about replica count.  I've seen people 
say they have "one shard and one replica" when the right way to state it 
is that the replica count is two.


If the counts above are large (meaning that you have a LOT of cores) 
then restarting a node can be very disruptive to the cloud as a whole. 
See this issue from several years ago where I explored this:


https://issues.apache.org/jira/browse/SOLR-7191

The issue has been marked as resolved in version 6.3.0, but no code was 
modified, and as far as I know, the problem still exists.


It's worth noting that in my tests for that issue, the collections were 
empty.  For collections that actually have data, the problem will be worse.


If there are a lot of adds/updates/deletes happening, then the delta 
between the replicas might exceed the threshold for transaction log 
recovery.  Solr may be doing a full replication to the cores on the 
restarted node.  But I would expect that to only affect the shard 
leaders, which are the source for the replicated data.


Thanks,
Shawn



Re: Optimal Sharding Strategy for Solr Cloud v8.10

2023-09-13 Thread Walter Underwood
This is all great advice.

There is no optimal number of shards. I’ve run clusters with 4 shards, we 
currently have one cluster with 96 shards and one with 320 shards. The next one 
we build out will probably not be sharded.

With long queries, I’ve usually seen a roughly linear speedup with sharding. 
Double the shards, halve the response time.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Sep 13, 2023, at 4:48 AM, Jan Høydahl  wrote:
> 
> Hi,
> 
> There are no hard rules wrt sharding, it often comes down to measuring and 
> experimenting for your workload.
> 
> There are other things to consider than shard size. Why are the queries slow? 
> How many rows do you ask for? Do you use faceting? Grouping?
> You have 25Gb of data on each of the 8 nodes/shards. Now, how much RAM does 
> each node have, and how much RAM did you allocate to Solr/Java?
> A common mistake is to allocate too much ram/heap to Solr to you don't get 
> any virtual memory caching in Linux.
> Say you have 32Gb of physical RAM on the nodes. Then do not give 30 of those 
> to Solr. Instead give 8Gb to Solr and let 24Gb be available for disk caching.
> 
> Other things to consider is to look at whether your queries can be optimized 
> by rewriting them to more efficient equivalents. Sometimes, Solr-level caches 
> can also help.
> 
> Wrt shards efficiency: If you already have 8 shards, it is not much more 
> expensive to go to 16, but you increase the risk of a single failure 
> affecting your requests...
> 
> Jan
> 
>> 13. sep. 2023 kl. 10:32 skrev Saksham Gupta 
>> :
>> 
>> Hi All,
>> 
>> I have been trying to reduce the response time of solr cloud(v8.10, 8
>> nodes). To achieve this, I have tried increasing the number of shards of
>> solr cloud which can help reduce data size on each shard thereby reducing
>> response time.
>> 
>> 
>> I have encountered a few questions regarding sharding strategy:
>> 
>> 1. How to decide the ideal number of shards? Is there a minimum or maximum
>> number of shards which should be used?
>> 
>> 2. What is the minimum size of a shard after which reducing the size
>> further won't have any effect on the response time (as time taken by other
>> factors like data aggregation will compensate for that) ?
>> 
>> 3. Is there some maximum limit to the size of data that should be kept in a
>> shard?
>> 
>> 
>> As of now we have 8 shards each on a separate node with ~25 gb of
>> data(15-16 million docs) present on each shard. Please advise me of the
>> standard approaches to define the number of shards and shard size. Thanks
>> in advance.
> 



Re: Optimal Sharding Strategy for Solr Cloud v8.10

2023-09-13 Thread Jan Høydahl
Yes, if your average query touches just too many documents (such as huge OR 
queries) and has some processing that needs to touch each hit (scoring, result 
transformation, highlighting mm), then simply splitting the elephant with 
shards may help. Or if you ask for 100 facets and your facets are slow, you 
could perhaps use facet.threads to speed up that part. Or if you use grouping 
you could try collapse. Etc etc. We need to know more about your data, queries 
and use case to answer what the cure might be.

Jan

> 13. sep. 2023 kl. 16:22 skrev Walter Underwood :
> 
> This is all great advice.
> 
> There is no optimal number of shards. I’ve run clusters with 4 shards, we 
> currently have one cluster with 96 shards and one with 320 shards. The next 
> one we build out will probably not be sharded.
> 
> With long queries, I’ve usually seen a roughly linear speedup with sharding. 
> Double the shards, halve the response time.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On Sep 13, 2023, at 4:48 AM, Jan Høydahl  wrote:
>> 
>> Hi,
>> 
>> There are no hard rules wrt sharding, it often comes down to measuring and 
>> experimenting for your workload.
>> 
>> There are other things to consider than shard size. Why are the queries 
>> slow? How many rows do you ask for? Do you use faceting? Grouping?
>> You have 25Gb of data on each of the 8 nodes/shards. Now, how much RAM does 
>> each node have, and how much RAM did you allocate to Solr/Java?
>> A common mistake is to allocate too much ram/heap to Solr to you don't get 
>> any virtual memory caching in Linux.
>> Say you have 32Gb of physical RAM on the nodes. Then do not give 30 of those 
>> to Solr. Instead give 8Gb to Solr and let 24Gb be available for disk caching.
>> 
>> Other things to consider is to look at whether your queries can be optimized 
>> by rewriting them to more efficient equivalents. Sometimes, Solr-level 
>> caches can also help.
>> 
>> Wrt shards efficiency: If you already have 8 shards, it is not much more 
>> expensive to go to 16, but you increase the risk of a single failure 
>> affecting your requests...
>> 
>> Jan
>> 
>>> 13. sep. 2023 kl. 10:32 skrev Saksham Gupta 
>>> :
>>> 
>>> Hi All,
>>> 
>>> I have been trying to reduce the response time of solr cloud(v8.10, 8
>>> nodes). To achieve this, I have tried increasing the number of shards of
>>> solr cloud which can help reduce data size on each shard thereby reducing
>>> response time.
>>> 
>>> 
>>> I have encountered a few questions regarding sharding strategy:
>>> 
>>> 1. How to decide the ideal number of shards? Is there a minimum or maximum
>>> number of shards which should be used?
>>> 
>>> 2. What is the minimum size of a shard after which reducing the size
>>> further won't have any effect on the response time (as time taken by other
>>> factors like data aggregation will compensate for that) ?
>>> 
>>> 3. Is there some maximum limit to the size of data that should be kept in a
>>> shard?
>>> 
>>> 
>>> As of now we have 8 shards each on a separate node with ~25 gb of
>>> data(15-16 million docs) present on each shard. Please advise me of the
>>> standard approaches to define the number of shards and shard size. Thanks
>>> in advance.
>> 
> 



Join and Distributed Search

2023-09-13 Thread Walter Underwood
We have a sharded collection that joins with a non-sharded collection. The 
non-sharded collection has a replica on every node. Does the join automatically 
choose the local replica or do we need to pass in a shard preference param?

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



Configure zk to not send requests to new node for x minutes

2023-09-13 Thread rajani m
Hi Solr Users,

Is there a way to stop sending search queries to a newly joined
node/respective shards for a few minutes?

Thanks,
Rajani


build issue

2023-09-13 Thread Nope nope
I am currently trying to run the command \gradlew build and it says this:
" Task :solr:documentation:changesToHtml FAILED". I have java version 17, I
am wondering if anyone has had this issue before and how they have fixed
it. Thankyou


Re: Restart on a node triggers restart like impact on all the other nodes in cluster

2023-09-13 Thread rajani m
Hi,

  Thank you for looking into this one.

I tailed the logs and figured it is the new node that is the culprit, it is
failing search queries upon startup for sometime. The errors as seen below.
This query failure on that node and shard(s) is causing overall aggregated
search query to take about 7-10 seconds causing latency spikes and the
consequence of which load balancer is sending lower number of requests and
hence other nodes have dip in search qps there by dip in their resource
usage.

A node goes off and recovers fully, zk immediately starts sending the
search queries to it and fails x queries for sometime(2 minutes) before it
actually returns 200 responses to queries. What is the cause of the first
error, anyway to avoid it?

Thinking if there is a way to configure zk to not send traffic to that node
shards for a few minutes and warm it up with some queries before it starts
sending queries to it?


Error -

2023-09-13 17:50:34.277 ERROR (qtp517787604-178) [c:v9-web s:shard34
r:core_node772 x:v9-web_shard34_replica_n771] o.a.s.h.RequestHandlerBase
java.util.ConcurrentModificationException =>
java.util.ConcurrentModificationException
at java.base/java.util.HashMap.computeIfAbsent(HashMap.java:1221)
java.util.ConcurrentModificationException: null
at java.util.HashMap.computeIfAbsent(HashMap.java:1221) ~[?:?]
at
org.apache.solr.schema.IndexSchema.getPayloadDecoder(IndexSchema.java:2118)
~[?:?]
at
org.apache.solr.search.ValueSourceParser$66.parse(ValueSourceParser.java:899)
~[?:?]
at
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:434)
~[?:?]
at
org.apache.solr.search.FunctionQParser.parseValueSourceList(FunctionQParser.java:264)
~[?:?]
at
org.apache.solr.search.FunctionQParser.parseValueSourceList(FunctionQParser.java:252)
~[?:?]
at
org.apache.solr.search.ValueSourceParser$17.parse(ValueSourceParser.java:349)
~[?:?]
at
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:434)
~[?:?]
at org.apache.solr.search.FunctionQParser.parse(FunctionQParser.java:94)
~[?:?]
at org.apache.solr.search.QParser.getQuery(QParser.java:188) ~[?:?]
at
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:384)
~[?:?]
at
org.apache.solr.search.FunctionQParser.parseValueSourceList(FunctionQParser.java:264)
~[?:?]
at
org.apache.solr.search.FunctionQParser.parseValueSourceList(FunctionQParser.java:252)
~[?:?]
at
org.apache.solr.search.ValueSourceParser$16.parse(ValueSourceParser.java:338)
~[?:?]
at
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:434)
~[?:?]
at org.apache.solr.search.FunctionQParser.parse(FunctionQParser.java:94)
~[?:?]
at org.apache.solr.search.QParser.getQuery(QParser.java:188) ~[?:?]
at
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:384)
~[?:?]
at
org.apache.solr.search.FunctionQParser.parseValueSourceList(FunctionQParser.java:264)
~[?:?]
at
org.apache.solr.search.FunctionQParser.parseValueSourceList(FunctionQParser.java:252)
~[?:?]
at
org.apache.solr.search.ValueSourceParser$17.parse(ValueSourceParser.java:349)
~[?:?]
at
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:434)
~[?:?]
at
org.apache.solr.search.FunctionQParser.parseValueSourceList(FunctionQParser.java:264)
~[?:?]
at
org.apache.solr.search.FunctionQParser.parseValueSourceList(FunctionQParser.java:252)
~[?:?]
at
org.apache.solr.search.ValueSourceParser$16.parse(ValueSourceParser.java:338)
~[?:?]
at
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:434)
~[?:?]
at
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:272)
~[?:?]
at
org.apache.solr.search.ValueSourceParser$DoubleParser.parse(ValueSourceParser.java:1646)
~[?:?]
at
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:434)
~[?:?]
at org.apache.solr.search.FunctionQParser.parse(FunctionQParser.java:94)
~[?:?]
at org.apache.solr.search.QParser.getQuery(QParser.java:188) ~[?:?]
at
org.apache.solr.search.ExtendedDismaxQParser.getMultiplicativeBoosts(ExtendedDismaxQParser.java:532)
~[?:?]

and

query params  =>
org.apache.lucene.index.ExitableDirectoryReader$ExitingReaderException: The
request took too long to iterate over point values. Timeout: timeoutAt:
130698068079 (System.nanoTime(): 130791560015),
PointValues=org.apache.lucene.util.bkd.BKDReader@2b77d603
at
org.apache.lucene.index.ExitableDirectoryReader$ExitablePointValues.checkAndThrow(ExitableDirectoryReader.java:482)
org.apache.lucene.index.ExitableDirectoryReader$ExitingReaderException: The
request took too long to iterate over point values. Timeout: timeoutAt:
130698068079 (System.nanoTime(): 130791560015),
PointValues=org.apache.lucene.util.bkd.BKDReader@2b77d603
at
org.apache.lucene.index.ExitableDirectoryReader$ExitablePointValues.checkAndThrow(ExitableDirectoryReader.java:482)
~[?:?]
at
org.apache.lucene.index.ExitableDirectoryReader$ExitablePointValues.(ExitableDirectoryReader.java:471)
~[?:?

Re: Join and Distributed Search

2023-09-13 Thread Mikhail Khludnev
Hello Walter.
I think the former has a place - join picks local replica or fail. I don't
think join query (unless crossCollection) bothers about shard preference.

On Wed, Sep 13, 2023 at 7:21 PM Walter Underwood 
wrote:

> We have a sharded collection that joins with a non-sharded collection. The
> non-sharded collection has a replica on every node. Does the join
> automatically choose the local replica or do we need to pass in a shard
> preference param?
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>

-- 
Sincerely yours
Mikhail Khludnev


Re: Join and Distributed Search

2023-09-13 Thread Walter Underwood
As I said, these are two different collections. —wunder

> On Sep 13, 2023, at 12:08 PM, Mikhail Khludnev  wrote:
> 
> Hello Walter.
> I think the former has a place - join picks local replica or fail. I don't
> think join query (unless crossCollection) bothers about shard preference.
> 
> On Wed, Sep 13, 2023 at 7:21 PM Walter Underwood 
> wrote:
> 
>> We have a sharded collection that joins with a non-sharded collection. The
>> non-sharded collection has a replica on every node. Does the join
>> automatically choose the local replica or do we need to pass in a shard
>> preference param?
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>> 
> 
> -- 
> Sincerely yours
> Mikhail Khludnev



Re: Solr latest 9.x - SOLR_HOST

2023-09-13 Thread Natarajan, Rajeswari
Nope ,  turned out to be adding the setting in wrong condition loop in 
solr.in.sh

Thanks,
Rajeswari

On 9/13/23, 5:34 AM, "Ishan Chattopadhyaya" mailto:ichattopadhy...@gmail.com>> wrote:


If you're sure it is a regression, please open a JIRA ticket for it and
someone can take a look. Alternatively, please feel free to submit a PR too.


Thanks!
Ishan


On Wed, 13 Sept, 2023, 8:00 am Natarajan, Rajeswari,
mailto:rajeswari.natara...@sap.com.inva>lid> 
wrote:


> Additional details
>
> Running Solr instance in alpine OS
> Not running solr as service
> And it is working with solr 8.11.1 version
>
> Thanks,
> Rajeswari
> On 9/12/23, 10:17 PM, "Natarajan, Rajeswari" 
> mailto:rajeswari.natara...@sap.com.inva>
>  >LID> wrote:
>
>
> Trying to upgrade to solr 9.3 .
>
>
> Defined below in solr.in.sh
> SOLR_HOST=
> SOLR_JETTY_HOST=0.0.0.0
>
>
>
>
> But still see the solr instance getting registered as localhost with
> zookeeper . Not sure what is missing?
>
>
> 2023-09-13 02:11:10.514 INFO (main) [] o.a.s.c.ZkController Register node
> as live in ZooKeeper:/live_nodes/localhost:8983_solr
>
>
>
>
> Thanks,
> Rajeswari
>
>
>
>
> On 5/21/22, 11:06 AM, "Shawn Heisey"    apa...@elyograg.org >    apa...@elyograg.org 
>
>
>
>
>
>
>
> On 5/21/2022 8:02 AM, Shawn Heisey wrote:
> > If you have installed the solr service, then you would want to do
> > "service solr status" instead, replacing solr with whatever you
> > actually named the service. Did you install the service with the
> > service installer script? What options did you use for that? What OS
> > is this on? What is the full path of the configuration file where you
> > changed SOLR_HOST? If you haven't used the installer script, then I
> > will need details about exactly how and where you installed Solr in
> > order to know what to ask next.
>
>
>
>
> This might be really easy to resolve.
>
>
>
>
> I've been looking at the startup scripts. SOLR_HOST does not control
> what address Solr listens on. Solr 9.x only listens on 127.0.0.1 by
> default. Previous Solr versions listened on all interfaces by default.
>
>
>
>
> I bet if you added this line it might start working. If not, provide
> the info already requested:
>
>
>
>
> SOLR_JETTY_HOST=192.168.100.2
>
>
>
>
> You should probably also keep the SOLR_HOST you have defined.
>
>
>
>
> Thanks,
> Shawn
>
>
>
>
>
>
>
>
>
>
>
>
>
>





Re: Solr latest 9.x - SOLR_HOST

2023-09-13 Thread Natarajan, Rajeswari
Yes the solr.in.sh is modified and based on the env there is different logic to 
set the hostname. Solr is deployed and then modified solr.in.sh is copied .
As stated in the other email , this is user error.

Thanks,
Rajeeswari

On 9/13/23, 7:52 AM, "Jan Høydahl" mailto:jan@cominvent.com>> wrote:


Are you modifying the correct solr.in.sh file?
How do you install solr?


Can you by any chance reproduce this in a new, tiny, fresh 9.3 cluster, to rule 
out old cruft remaining from an existing install?


Jan


> 13. sep. 2023 kl. 04:29 skrev Natarajan, Rajeswari 
>  LID>:
> 
> Additional details
> 
> Running Solr instance in alpine OS
> Not running solr as service
> And it is working with solr 8.11.1 version
> 
> Thanks,
> Rajeswari
> On 9/12/23, 10:17 PM, "Natarajan, Rajeswari" 
> mailto:rajeswari.natara...@sap.com.inva> 
>  >LID> wrote:
> 
> 
> Trying to upgrade to solr 9.3 .
> 
> 
> Defined below in solr.in.sh
> SOLR_HOST=
> SOLR_JETTY_HOST=0.0.0.0
> 
> 
> 
> 
> But still see the solr instance getting registered as localhost with 
> zookeeper . Not sure what is missing?
> 
> 
> 2023-09-13 02:11:10.514 INFO (main) [] o.a.s.c.ZkController Register node as 
> live in ZooKeeper:/live_nodes/localhost:8983_solr
> 
> 
> 
> 
> Thanks,
> Rajeswari
> 
> 
> 
> 
> On 5/21/22, 11:06 AM, "Shawn Heisey"    >     
> 
> 
> 
> 
> 
> 
> 
> On 5/21/2022 8:02 AM, Shawn Heisey wrote:
>> If you have installed the solr service, then you would want to do 
>> "service solr status" instead, replacing solr with whatever you 
>> actually named the service. Did you install the service with the 
>> service installer script? What options did you use for that? What OS 
>> is this on? What is the full path of the configuration file where you 
>> changed SOLR_HOST? If you haven't used the installer script, then I 
>> will need details about exactly how and where you installed Solr in 
>> order to know what to ask next.
> 
> 
> 
> 
> This might be really easy to resolve.
> 
> 
> 
> 
> I've been looking at the startup scripts. SOLR_HOST does not control 
> what address Solr listens on. Solr 9.x only listens on 127.0.0.1 by 
> default. Previous Solr versions listened on all interfaces by default.
> 
> 
> 
> 
> I bet if you added this line it might start working. If not, provide 
> the info already requested:
> 
> 
> 
> 
> SOLR_JETTY_HOST=192.168.100.2
> 
> 
> 
> 
> You should probably also keep the SOLR_HOST you have defined.
> 
> 
> 
> 
> Thanks,
> Shawn
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 







Re: Optimal Sharding Strategy for Solr Cloud v8.10

2023-09-13 Thread Bernd Fehling

@Walter,

how on earth are you monitoring all vital Solr Cloud Parameters for 320 shards?

Regards,
Bernd


Am 13.09.23 um 16:22 schrieb Walter Underwood:

This is all great advice.

There is no optimal number of shards. I’ve run clusters with 4 shards, we 
currently have one cluster with 96 shards and one with 320 shards. The next one 
we build out will probably not be sharded.

With long queries, I’ve usually seen a roughly linear speedup with sharding. 
Double the shards, halve the response time.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


On Sep 13, 2023, at 4:48 AM, Jan Høydahl  wrote:

Hi,

There are no hard rules wrt sharding, it often comes down to measuring and 
experimenting for your workload.

There are other things to consider than shard size. Why are the queries slow? 
How many rows do you ask for? Do you use faceting? Grouping?
You have 25Gb of data on each of the 8 nodes/shards. Now, how much RAM does 
each node have, and how much RAM did you allocate to Solr/Java?
A common mistake is to allocate too much ram/heap to Solr to you don't get any 
virtual memory caching in Linux.
Say you have 32Gb of physical RAM on the nodes. Then do not give 30 of those to 
Solr. Instead give 8Gb to Solr and let 24Gb be available for disk caching.

Other things to consider is to look at whether your queries can be optimized by 
rewriting them to more efficient equivalents. Sometimes, Solr-level caches can 
also help.

Wrt shards efficiency: If you already have 8 shards, it is not much more 
expensive to go to 16, but you increase the risk of a single failure affecting 
your requests...

Jan


13. sep. 2023 kl. 10:32 skrev Saksham Gupta 
:

Hi All,

I have been trying to reduce the response time of solr cloud(v8.10, 8
nodes). To achieve this, I have tried increasing the number of shards of
solr cloud which can help reduce data size on each shard thereby reducing
response time.


I have encountered a few questions regarding sharding strategy:

1. How to decide the ideal number of shards? Is there a minimum or maximum
number of shards which should be used?

2. What is the minimum size of a shard after which reducing the size
further won't have any effect on the response time (as time taken by other
factors like data aggregation will compensate for that) ?

3. Is there some maximum limit to the size of data that should be kept in a
shard?


As of now we have 8 shards each on a separate node with ~25 gb of
data(15-16 million docs) present on each shard. Please advise me of the
standard approaches to define the number of shards and shard size. Thanks
in advance.







--
*
Bernd FehlingBielefeld University Library
Dipl.-Inform. (FH)LibTec - Library Technology
Universitätsstr. 25  and Knowledge Management
33615 Bielefeld
Tel. +49 521 106-4060   bernd.fehling(at)uni-bielefeld.de
  https://www.ub.uni-bielefeld.de/~befehl/

BASE - Bielefeld Academic Search Engine - www.base-search.net
*