Re: Load on Solr Nodes due to High GC

2024-06-20 Thread Deepak Goel
Can you please tell me about the hardware details (Server type, CPU speed
and type, Disk Speed and type) and GC configuration? Also please post
results of top, iotop if you can?


Deepak
"The greatness of a nation can be judged by the way its animals are treated
- Mahatma Gandhi"

+91 73500 12833
deic...@gmail.com

LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

Make In India : http://www.makeinindia.com/home


On Thu, Jun 20, 2024 at 11:24 AM Oleksandr Tkachuk 
wrote:

> Use tlog+pull replicas, they will improve the situation significantly
>
> чт, 20 июн. 2024 г., 07:27 Saksham Gupta
> :
>
> > Hi All,
> >
> > We have been facing extra load incidents due to higher gc count and gc
> time
> > causing higher response time and timeouts.
> >
> > Solr Cloud Cluster Details
> >
> > We use solr cloud v8.10 [with java 8 and G1 GC] with 8 shards where each
> > shard is present on a single vm of 16 cores and 50 gb RAM. Size of each
> > shard is ~28 gb and heap of solr is 16 gb [heap utilization only for
> > filter, document, and queryResults cache each of size 512].
> >
> > Problem Details
> >
> > We pause indexing at 11 AM during peak searching hours. Normally the
> system
> > remains stable during the peak hours, but when documents update count on
> > solr is higher before peak hours [b/w from 5.30 AM to 11 AM], we face
> > multiple load issues. The gc count and gc time increases and cpu is
> > consumed in gc itself thereby increasing load and response time of the
> > system. To mitigate this, we recently increased the ram on the servers
> [to
> > 50 gb from 42 gb previously], as to reduce the io wait for writing solr
> > index on memory multiple times. Taking a step further, we also increased
> > the heap of solr from 12 to 16 gb [also tried other combinations like 14
> > gb, 15 gb, 18 gb], although we found some reduction in load issues due to
> > lower io wait, still the issue recurs when higher indexing is done.
> >
> > We have explored a few options like expunge deletes, which may help
> reduce
> > the deleted documents percentage, but that cannot be executed close to
> peak
> > hours, as it increases io wait which further spikes load and response
> time
> > of solr significantly.
> >
> >
> >1.
> >
> >Apart from changing the expunge deletes timing, is there another
> option
> >which we can try to mitigate this problem?
> >2.
> >
> >Approximately 60 million documents are updated each day i.e. ~30% of
> the
> >complete solr index is modified each day while serving ~20 million
> > search
> >requests. Would appreciate any knowledge upon how to handle such high
> >indexing + searching traffic during peak hours.
> >
>


Re: Load on Solr Nodes due to High GC

2024-06-20 Thread matthew sporleder
Are you having iowait, gc pauses, or something else? Do you commit often or in 
one big batch? 

> On Jun 20, 2024, at 12:26 AM, Saksham Gupta 
>  wrote:
> 
> Hi All,
> 
> We have been facing extra load incidents due to higher gc count and gc time
> causing higher response time and timeouts.
> 
> Solr Cloud Cluster Details
> 
> We use solr cloud v8.10 [with java 8 and G1 GC] with 8 shards where each
> shard is present on a single vm of 16 cores and 50 gb RAM. Size of each
> shard is ~28 gb and heap of solr is 16 gb [heap utilization only for
> filter, document, and queryResults cache each of size 512].
> 
> Problem Details
> 
> We pause indexing at 11 AM during peak searching hours. Normally the system
> remains stable during the peak hours, but when documents update count on
> solr is higher before peak hours [b/w from 5.30 AM to 11 AM], we face
> multiple load issues. The gc count and gc time increases and cpu is
> consumed in gc itself thereby increasing load and response time of the
> system. To mitigate this, we recently increased the ram on the servers [to
> 50 gb from 42 gb previously], as to reduce the io wait for writing solr
> index on memory multiple times. Taking a step further, we also increased
> the heap of solr from 12 to 16 gb [also tried other combinations like 14
> gb, 15 gb, 18 gb], although we found some reduction in load issues due to
> lower io wait, still the issue recurs when higher indexing is done.
> 
> We have explored a few options like expunge deletes, which may help reduce
> the deleted documents percentage, but that cannot be executed close to peak
> hours, as it increases io wait which further spikes load and response time
> of solr significantly.
> 
> 
>   1.
> 
>   Apart from changing the expunge deletes timing, is there another option
>   which we can try to mitigate this problem?
>   2.
> 
>   Approximately 60 million documents are updated each day i.e. ~30% of the
>   complete solr index is modified each day while serving ~20 million search
>   requests. Would appreciate any knowledge upon how to handle such high
>   indexing + searching traffic during peak hours.


Zookeeper KeeperErrorCode = NodeExists

2024-06-20 Thread Sergio García Maroto
Hi All.

I am facing a weird issue while upgrading Solr8.11 to Solr9.
I have everyhting up and running passing all kind of tests unit and
integration on my current CD process.

I have a cluster of 3 machines on SolrCloud and it's all good and working.
Problem happens when machines are restarted. Either 1 or 2 servers of the
cluster can't connect to zookeeper even when zookeeper reports as healthy
and stable.
If I restart solr then the server can connect back to the cluster and gets
healthy.

I check the logs and everything seems normal except the servers who tries
to connect ot the cluster and fails  on start. I get this error.
I tried to delay the start of solr a bit just in case but no luck.

Any help much appreciated.
Sergio

2024-06-20 12:56:42.944 INFO  (main) [   ] o.a.s.c.c.ZkStateReader Updated
live nodes from ZooKeeper... (0) -> (2)
2024-06-20 12:56:43.003 INFO  (main) [   ]
o.a.s.c.DistributedClusterStateUpdater Creating
DistributedClusterStateUpdater with useDistributedStateUpdate=false. Solr
will be using Overseer based cluster state updates.
2024-06-20 12:56:43.056 INFO  (main) [   ] o.a.s.c.ZkController Publish
node=server03:8983_solr as DOWN
2024-06-20 12:56:43.088 INFO  (main) [   ] o.a.s.c.ZkController Register
node as live in ZooKeeper:/live_nodes/server03:8983_solr
2024-06-20 12:56:43.111 ERROR (main) [   ] o.a.s.c.ZkController  =>
org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode =
NodeExists
at
org.apache.zookeeper.KeeperException.create(KeeperException.java:125)
org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode =
NodeExists
at
org.apache.zookeeper.KeeperException.create(KeeperException.java:125) ~[?:?]
at
org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1778) ~[?:?]
at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1650) ~[?:?]
at
org.apache.solr.common.cloud.SolrZkClient.lambda$multi$12(SolrZkClient.java:781)
~[?:?]
at
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:70)
~[?:?]
at
org.apache.solr.common.cloud.SolrZkClient.multi(SolrZkClient.java:781)
~[?:?]
at
org.apache.solr.cloud.ZkController.createEphemeralLiveNode(ZkController.java:1211)
~[?:?]


Solr replication delays in IndexFetcher

2024-06-20 Thread Marcus Bergner
Hi,
I'm using a traditional master/replica Solr (8.11) setup and I'm trying to tune 
Solr's autoCommitTimeout, autoSoftCommitTimeout on the Solr master and the 
pollInterval on the replicas to achieve an overall better indexing throughput 
while still maintaining an acceptably low indexing latency on the replicas. The 
indexing latencies on the replicas are much longer than I would expect and I 
don't understand why so I'm hoping someone here might have some insights on 
what the possible cause is and what can be done about it.

On a test environment with a large amount of test data already indexed and 
replicated I make one small update which cause a couple of documents in 3 Solr 
cores to be updated (one update request per core sent to Solr's API).
The Solr master log file shows all three /update requests coming in at 
13:10:30. The 3 indexing requests are all done WITHOUT explicitly specified 
"commit=true" or "softCommit=true". I.e. only the solrconfig.xml specified auto 
commit max times should affect when commits take place.
Currently the autoCommit maxTime is set to 2 and the autoSoftCommit maxTime 
is 2000 but I have also tried higher autoCommit maxTime values with similarly 
confusing results.

I have a pollInterval of 00:00:10 on the replica. When making the above index 
updates and issuing search queries against the replica it takes several minutes 
before I get a corresponding search hit from the replica. In some cases 3-4 
minutes, sometimes a bit less.

I the following strange behavior in the logs of the Solr replica. Replica seems 
to notice something has changed after 25-26 seconds (ok assuming autoCommit 
maxTime is 20 seconds and pollInterval is 10 seconds)

2024-06-20 13:10:56.768 INFO  (indexFetcher-81-thread-1) [   ] 
o.a.s.h.IndexFetcher Starting download (fullCopy=false) to 
NRTCachingDirectory(MMapDirectory@/data0/solr8/xlcore/data/index.20240620131056059
 lockFactory=org.apache.lucene.store.NativeFSLockFactory@25fb1467; 
maxCacheMB=48.0 maxMergeSizeMB=4.0)
... most files being skipped, "Fetched and wrote" 15 files
2024-06-20 13:10:56.841 INFO  (indexFetcher-81-thread-1) [   ] 
o.a.s.h.IndexFetcher Total time taken for download 
(fullCopy=false,bytesDownloaded=225681) : 0 secs (null bytes/sec) to 
NRTCachingDirectory(MMapDirectory@/data0/solr8/xlcore/data/index.20240620131056059
 lockFactory=org.apache.lucene.store.NativeFSLockFactory@25fb1467; 
maxCacheMB=48.0 maxMergeSizeMB=4.0)

So far so good, but this is only one of the three cores that was updated at 
13:10:30. The second core is processed much later:

2024-06-20 13:11:12.370 INFO  (indexFetcher-89-thread-1) [   ] 
o.a.s.h.IndexFetcher Starting download (fullCopy=false) to 
NRTCachingDirectory(MMapDirectory@/data0/solr8/defcore/data/index.20240620131056964
 lockFactory=org.apache.lucene.store.NativeFSLockFactory@25fb1467; 
maxCacheMB=48.0 maxMergeSizeMB=4.0)
...
2024-06-20 13:11:12.409 INFO  (indexFetcher-89-thread-1) [   ] 
o.a.s.h.IndexFetcher Total time taken for download 
(fullCopy=false,bytesDownloaded=281548) : 15 secs (18769 bytes/sec) to 
NRTCachingDirectory(MMapDirectory@/data0/solr8/defcore/data/index.20240620131056964
 lockFactory=org.apache.lucene.store.NativeFSLockFactory@25fb1467; 
maxCacheMB=48.0 maxMergeSizeMB=4.0)

and the third one even more later:

2024-06-20 13:11:35.468 INFO  (indexFetcher-91-thread-1) [   ] 
o.a.s.h.IndexFetcher Starting download (fullCopy=false) to 
NRTCachingDirectory(MMapDirectory@/data0/solr8/parentcore/data/index.20240620131109083
 lockFactory=org.apache.lucene.store.NativeFSLockFactory@25fb1467; 
maxCacheMB=48.0 maxMergeSizeMB=4.0)
...
2024-06-20 13:11:35.498 INFO  (indexFetcher-91-thread-1) [   ] 
o.a.s.h.IndexFetcher Total time taken for download 
(fullCopy=false,bytesDownloaded=221332) : 26 secs (8512 bytes/sec) to 
NRTCachingDirectory(MMapDirectory@/data0/solr8/parentcore/data/index.20240620131109083
 lockFactory=org.apache.lucene.store.NativeFSLockFactory@25fb1467; 
maxCacheMB=48.0 maxMergeSizeMB=4.0)

How can I get all updated cores to be replicated within 1 autoCommit maxTime + 
1 pollInterval time frame, or at the very least 2 autoCommit maxTime + 1 
pollInterval? Right now it looks like only one core is being replicated, then 
there is 15-25 seconds of doing nothing, then replicating another core, 15-25 
seconds of doing nothing etc.

Kind regards,

Marcus


are bots DoS'ing anyone else's Solr?

2024-06-20 Thread Dmitri Maziuk

Hi all,

the latest mole in the eternal whack-a-mole game with web crawlers 
(GPTBot) DoS'ed our Solr again & I took a closer look at the logs. 
Here's what it looks like is happening:


- the bot is hitting a URL backed by Solr search and starts following 
all permutations of facets and "next page"s at a rate of 60+ hits/second.
- Solr is not returning the results fast enough and the bot is dropping 
connections.
- An INFO message is logged: jetty is "unable to write response, client 
closed connection or we are shutting down" -- IOException on the 
OutputStream: Closed.


These go on for a while until:

java.nio.file.FileSystemException: 
$PATH_TO\server\solr\preview_shard1_replica_n2\data\tlog\buffer.tlog.800034318988100: 
The process cannot access the file because it is being used by another 
process.

 -- Different file suffix # on every one of those

And eventually an update comes in and fails with

ERROR (qtp173791568-23140) [c:preview s:shard1 r:core_node4 
x:preview_shard1_replica_n2] o.a.s.h.RequestHandlerBase 
org.apache.solr.common.SolrException: Error logging add => 
org.apache.solr.common.SolrException: Error logging add
  at 
org.apache.solr.update.TransactionLog.write(TransactionLog.java:420)

org.apache.solr.common.SolrException: Error logging add

Caused by: java.io.IOException: There is not enough space on the disk
...

At this point Solr is hosed. Admin page shows "no collections available" 
but does respond to queries; all queries from the website client (.NET) 
are failing.


This is Solr 8-11.2 on winders server 2022/correto JVM 11.

So, questions: has anyone else seen this?

Who is "buffer.tlog.xyz", do they have a size/# files cap, and are they 
not getting GC'ed fast enough under this kind of load?


The 400GB disk is normally at ~90% empty, "not enough space on the disk" 
does not sound right. The logs do pile up when this happens and Java 
starts dumping gigabytes of stack traces, but they add up to few 100 MBs 
at most.  There certainly was *some* free space when I got to it, and 
it's back to 99% free after Solr restart.


Any suggestions as to how to deal with this?

(Obviously, I added "Disallow: /" to robots.txt for GPTBot, but that's 
only good until the next bot comes along.)


TIA
Dima



Re: are bots DoS'ing anyone else's Solr?

2024-06-20 Thread matthew sporleder
solr allows you to go into page=1000 or whatever, bots will follow it,
but there is rarely any business value for going so deep.

You can come up with a scheme for cursormarks + caching (faster than
paging) or just stop showing results past page 5-10.

On Thu, Jun 20, 2024 at 11:39 AM Dmitri Maziuk  wrote:
>
> Hi all,
>
> the latest mole in the eternal whack-a-mole game with web crawlers
> (GPTBot) DoS'ed our Solr again & I took a closer look at the logs.
> Here's what it looks like is happening:
>
> - the bot is hitting a URL backed by Solr search and starts following
> all permutations of facets and "next page"s at a rate of 60+ hits/second.
> - Solr is not returning the results fast enough and the bot is dropping
> connections.
> - An INFO message is logged: jetty is "unable to write response, client
> closed connection or we are shutting down" -- IOException on the
> OutputStream: Closed.
>
> These go on for a while until:
>
> java.nio.file.FileSystemException:
> $PATH_TO\server\solr\preview_shard1_replica_n2\data\tlog\buffer.tlog.800034318988100:
> The process cannot access the file because it is being used by another
> process.
>   -- Different file suffix # on every one of those
>
> And eventually an update comes in and fails with
>
> ERROR (qtp173791568-23140) [c:preview s:shard1 r:core_node4
> x:preview_shard1_replica_n2] o.a.s.h.RequestHandlerBase
> org.apache.solr.common.SolrException: Error logging add =>
> org.apache.solr.common.SolrException: Error logging add
> at
> org.apache.solr.update.TransactionLog.write(TransactionLog.java:420)
> org.apache.solr.common.SolrException: Error logging add
>
> Caused by: java.io.IOException: There is not enough space on the disk
> ...
>
> At this point Solr is hosed. Admin page shows "no collections available"
> but does respond to queries; all queries from the website client (.NET)
> are failing.
>
> This is Solr 8-11.2 on winders server 2022/correto JVM 11.
>
> So, questions: has anyone else seen this?
>
> Who is "buffer.tlog.xyz", do they have a size/# files cap, and are they
> not getting GC'ed fast enough under this kind of load?
>
> The 400GB disk is normally at ~90% empty, "not enough space on the disk"
> does not sound right. The logs do pile up when this happens and Java
> starts dumping gigabytes of stack traces, but they add up to few 100 MBs
> at most.  There certainly was *some* free space when I got to it, and
> it's back to 99% free after Solr restart.
>
> Any suggestions as to how to deal with this?
>
> (Obviously, I added "Disallow: /" to robots.txt for GPTBot, but that's
> only good until the next bot comes along.)
>
> TIA
> Dima
>


AW: are bots DoS'ing anyone else's Solr?

2024-06-20 Thread Ohms, Jannis
I Work in a library so yes we have a similar Problem our solr ist used inderect 
by a Webapplikationen running in another Server

WE use https://wiki.archlinux.org/title/fail2ban to Block IPs which exceed a 
given number of requests per Minute

Von: Dmitri Maziuk 
Gesendet: Donnerstag, 20. Juni 2024 17:38:27
An: users@solr.apache.org
Betreff: are bots DoS'ing anyone else's Solr?

Hi all,

the latest mole in the eternal whack-a-mole game with web crawlers
(GPTBot) DoS'ed our Solr again & I took a closer look at the logs.
Here's what it looks like is happening:

- the bot is hitting a URL backed by Solr search and starts following
all permutations of facets and "next page"s at a rate of 60+ hits/second.
- Solr is not returning the results fast enough and the bot is dropping
connections.
- An INFO message is logged: jetty is "unable to write response, client
closed connection or we are shutting down" -- IOException on the
OutputStream: Closed.

These go on for a while until:

java.nio.file.FileSystemException:
$PATH_TO\server\solr\preview_shard1_replica_n2\data\tlog\buffer.tlog.800034318988100:
The process cannot access the file because it is being used by another
process.
  -- Different file suffix # on every one of those

And eventually an update comes in and fails with

ERROR (qtp173791568-23140) [c:preview s:shard1 r:core_node4
x:preview_shard1_replica_n2] o.a.s.h.RequestHandlerBase
org.apache.solr.common.SolrException: Error logging add =>
org.apache.solr.common.SolrException: Error logging add
  at
org.apache.solr.update.TransactionLog.write(TransactionLog.java:420)
org.apache.solr.common.SolrException: Error logging add

Caused by: java.io.IOException: There is not enough space on the disk
...

At this point Solr is hosed. Admin page shows "no collections available"
but does respond to queries; all queries from the website client (.NET)
are failing.

This is Solr 8-11.2 on winders server 2022/correto JVM 11.

So, questions: has anyone else seen this?

Who is "buffer.tlog.xyz", do they have a size/# files cap, and are they
not getting GC'ed fast enough under this kind of load?

The 400GB disk is normally at ~90% empty, "not enough space on the disk"
does not sound right. The logs do pile up when this happens and Java
starts dumping gigabytes of stack traces, but they add up to few 100 MBs
at most.  There certainly was *some* free space when I got to it, and
it's back to 99% free after Solr restart.

Any suggestions as to how to deal with this?

(Obviously, I added "Disallow: /" to robots.txt for GPTBot, but that's
only good until the next bot comes along.)

TIA
Dima



Re: are bots DoS'ing anyone else's Solr?

2024-06-20 Thread Imran Chaudhry
+1 for fail2ban

@Dmitri Maziuk if your Solr is behind Apache httpd then you may be
interested in mod-evasive which worked well for XMLRPC attacks against
Wordpress.

You can combo it with fail2ban

https://ejectdisc.org/2015/08/08/admin-a-wordpress-site-running-on-debian-linux-learn-how-to-protect-it-from-dos-xmlrpc-attacks-and-similar/

It sounds like your Solr is publically exposed to the web. Yikes. An
alternative is to change the port that it's running on to something non
standard and random. These bots scan for well-known ports.

That's "security through obscurity" though and you should ideally be
running Solr behind some kind of "web application firewall".


On Thu, Jun 20, 2024, 4:56 PM Ohms, Jannis 
wrote:

> I Work in a library so yes we have a similar Problem our solr ist used
> inderect by a Webapplikationen running in another Server
>
> WE use https://wiki.archlinux.org/title/fail2ban to Block IPs which
> exceed a given number of requests per Minute
> 
> Von: Dmitri Maziuk 
> Gesendet: Donnerstag, 20. Juni 2024 17:38:27
> An: users@solr.apache.org
> Betreff: are bots DoS'ing anyone else's Solr?
>
> Hi all,
>
> the latest mole in the eternal whack-a-mole game with web crawlers
> (GPTBot) DoS'ed our Solr again & I took a closer look at the logs.
> Here's what it looks like is happening:
>
> - the bot is hitting a URL backed by Solr search and starts following
> all permutations of facets and "next page"s at a rate of 60+ hits/second.
> - Solr is not returning the results fast enough and the bot is dropping
> connections.
> - An INFO message is logged: jetty is "unable to write response, client
> closed connection or we are shutting down" -- IOException on the
> OutputStream: Closed.
>
> These go on for a while until:
>
> java.nio.file.FileSystemException:
>
> $PATH_TO\server\solr\preview_shard1_replica_n2\data\tlog\buffer.tlog.800034318988100:
> The process cannot access the file because it is being used by another
> process.
>   -- Different file suffix # on every one of those
>
> And eventually an update comes in and fails with
>
> ERROR (qtp173791568-23140) [c:preview s:shard1 r:core_node4
> x:preview_shard1_replica_n2] o.a.s.h.RequestHandlerBase
> org.apache.solr.common.SolrException: Error logging add =>
> org.apache.solr.common.SolrException: Error logging add
> at
> org.apache.solr.update.TransactionLog.write(TransactionLog.java:420)
> org.apache.solr.common.SolrException: Error logging add
>
> Caused by: java.io.IOException: There is not enough space on the disk
> ...
>
> At this point Solr is hosed. Admin page shows "no collections available"
> but does respond to queries; all queries from the website client (.NET)
> are failing.
>
> This is Solr 8-11.2 on winders server 2022/correto JVM 11.
>
> So, questions: has anyone else seen this?
>
> Who is "buffer.tlog.xyz", do they have a size/# files cap, and are they
> not getting GC'ed fast enough under this kind of load?
>
> The 400GB disk is normally at ~90% empty, "not enough space on the disk"
> does not sound right. The logs do pile up when this happens and Java
> starts dumping gigabytes of stack traces, but they add up to few 100 MBs
> at most.  There certainly was *some* free space when I got to it, and
> it's back to 99% free after Solr restart.
>
> Any suggestions as to how to deal with this?
>
> (Obviously, I added "Disallow: /" to robots.txt for GPTBot, but that's
> only good until the next bot comes along.)
>
> TIA
> Dima
>
>


Re: are bots DoS'ing anyone else's Solr?

2024-06-20 Thread Dmitri Maziuk

On 6/20/24 11:17, Imran Chaudhry wrote:
...

If I were running on linux I'd have them blocked at iptbales-recent 
too... and if I were running on bare metal I'd put it on an SSD-cached 
ZVOL and likely not see Java choke on nio under load. But I am not. :(



It sounds like your Solr is publically exposed to the web.


No:


- the bot is hitting a URL backed by Solr search and starts following
all permutations of facets and "next page"s at a rate of 60+ hits/second.


By "URL backed by Solr search" I meant a page on the website.

But anyway, it looks like it's not just us, it's a solr feature. Good to 
know.


Thanks all
Dima


Re: 150x+ performance hit when number of rows <= 50 in a simple query

2024-06-20 Thread Michael Gibney
I've been unable to reproduce anything like this behavior. If you're
really getting queryResultCache hits for these, then the field
type/etc of the field you're querying on shouldn't make a difference.
type/etc of the return field (product_id) would be more likely to
matter. I wonder what would happen if you fully bypassed the query
cache (i.e., `q={!cache=false}product_type:"1"`?

I recall that previously you had a very large number of dynamic
fields. Is that the case here as well? And if so, are the dynamic
fields mostly stored? docValues?



On Fri, Jun 14, 2024 at 7:29 AM Oleksandr Tkachuk  wrote:
>
> Initial data:
> Doc count: 1793026
> Field: "product_type", point int, indexed true, stored false,
> docvalues true. Values:
>  "facet_fields":{
>   "product_type":["3",1069282,"2",710042,"1",13702]
> },
> Single shard, single instance.
>
> # ./hey_linux_amd64 -n 1 -c 10 -T "application/json"
> 'http://localhost:8983/solr/XXX/select?fl=product_id&wt=json&q=product_type:"1"&start=0&rows=51'
> Summary:
>   Total:0.6374 secs
>   Slowest:  0.0043 secs
>   Fastest:  0.0003 secs
>   Average:  0.0006 secs
>   Requests/sec: 15688.5755
>
> # ./hey_linux_amd64 -n 1 -c 10 -T "application/json"
> 'http://localhost:8983/solr/XXX/select?fl=product_id&wt=json&q=product_type:"1"&start=0&rows=50'
> Summary:
>   Total:101.3246 secs
>   Slowest:  0.2048 secs
>   Fastest:  0.0564 secs
>   Average:  0.1007 secs
>   Requests/sec: 98.6927
>
>
> 1) I've already played with queryResultWindowSize and
> queryResultMaxDocsCached by setting different, high and low values and
> this is probably not what I'm looking for since it gave a  milliseconds difference in query performance
> 2) Checked on different versions of solr (9.6.1 and 8.7.0) - no
> significant changes
> 3) Tried changing the field type to string - zero performance changes
> 4) In both cases I see successful lookups in queryResultCache
> 5) Enabling documentCache solves the problem in this case (rows<=50),
> but introduces many other performance issues so it doesn't seem like a
> viable option.


Re: How to bind embedded zookeeper to specific interface/ip?

2024-06-20 Thread Chris Hostetter


For some historic reasons, Solr has always explicitly overridden the 
`clientPortAddress` -- but as of a few versions ago, there is a Solr 
setting (SOLR_ZK_EMBEDDED_HOST) that can be used to override solr's 
override...

https://solr.apache.org/guide/solr/latest/deployment-guide/taking-solr-to-production.html#security-considerations


If you're familiar with java code, the code Solr uses when instructed to 
run embeeded ZK server (and the logic for how that server is configured) 
can be found in SolrZkServer ...

https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/cloud/SolrZkServer.java#L85-L122



-Hoss
http://www.lucidworks.com/


Re: 150x+ performance hit when number of rows <= 50 in a simple query

2024-06-20 Thread Oleksandr Tkachuk
FYI: There is a solution in the last paragraph, but I still ran your
tests, since the solution was found by "Cut and Try"  and there is no
deep understanding.

>I wonder what would happen if you fully bypassed the query cache (i.e., 
>`q={!cache=false}product_type:"1"`?
It does not help, there is not even one millisecond of difference in both cases.

>I recall that previously you had a very large number of dynamic fields. Is 
>that the case here as well? And if so, are the dynamic fields mostly stored? 
>docValues?
This is another collection, I’ll get to the one with many many fields later :))
If this is the ~correct way to count the number of fields, then this
collection has the following number of fields:
curl -s "http://localhost:8983/solr/XXX/admin/luke?numTerms=0"; | grep
'"type"' | wc -l
121
Of these, 88 have docvalues enabled and 33 stored.

As for the two fields used in query, here's how they are defined in the schema.
  
  
  
  

Changing fl= to something like a string field with stored=true without
docvalues results in zero changes.
I also tried this simple query on string type fields (copying the
field) and got the same result. I also tried it on fields where the
cardinality was different - the spread was not 150 times, but also
often noticeable. In addition, I still do not fully understand the
logic of this behavior
("product_type":["3",1069282,"2",710042,"1",13702]) if I do:
1) q=product_type:"1" rows=50 - qtime 150ms
2) q=product_type:"1" rows=51 - qtime 0ms
3) q=product_type:"2" rows=50 - qtime 3ms
4) q=product_type:"2" rows=51 - qtime 0ms
5) q=product_type:"3" rows=50 - qtime 1ms
6) q=product_type:"3" rows=51 - qtime 0ms
I checked on other fields and get the same behavior - the fewer
documents contain a given value, the slower the query becomes.
If I can provide any more information, I will be glad.

The problem was solved by turning off enableLazyFieldLoading. I am
very surprised that this functionality continues to work when document
cache is disabled and I thought that this parameter was intended only
for it. In addition, we received an improvement in avg and 95% on many
other types of queries, as well as some reduction in CPU load. Are
there any consequences or disadvantages of such a decision? If not,
then perhaps it is worth paying attention to this problem.

On Thu, Jun 20, 2024 at 10:13 PM Michael Gibney
 wrote:
>
> I've been unable to reproduce anything like this behavior. If you're
> really getting queryResultCache hits for these, then the field
> type/etc of the field you're querying on shouldn't make a difference.
> type/etc of the return field (product_id) would be more likely to
> matter. I wonder what would happen if you fully bypassed the query
> cache (i.e., `q={!cache=false}product_type:"1"`?
>
> I recall that previously you had a very large number of dynamic
> fields. Is that the case here as well? And if so, are the dynamic
> fields mostly stored? docValues?
>
>
>
> On Fri, Jun 14, 2024 at 7:29 AM Oleksandr Tkachuk  
> wrote:
> >
> > Initial data:
> > Doc count: 1793026
> > Field: "product_type", point int, indexed true, stored false,
> > docvalues true. Values:
> >  "facet_fields":{
> >   "product_type":["3",1069282,"2",710042,"1",13702]
> > },
> > Single shard, single instance.
> >
> > # ./hey_linux_amd64 -n 1 -c 10 -T "application/json"
> > 'http://localhost:8983/solr/XXX/select?fl=product_id&wt=json&q=product_type:"1"&start=0&rows=51'
> > Summary:
> >   Total:0.6374 secs
> >   Slowest:  0.0043 secs
> >   Fastest:  0.0003 secs
> >   Average:  0.0006 secs
> >   Requests/sec: 15688.5755
> >
> > # ./hey_linux_amd64 -n 1 -c 10 -T "application/json"
> > 'http://localhost:8983/solr/XXX/select?fl=product_id&wt=json&q=product_type:"1"&start=0&rows=50'
> > Summary:
> >   Total:101.3246 secs
> >   Slowest:  0.2048 secs
> >   Fastest:  0.0564 secs
> >   Average:  0.1007 secs
> >   Requests/sec: 98.6927
> >
> >
> > 1) I've already played with queryResultWindowSize and
> > queryResultMaxDocsCached by setting different, high and low values and
> > this is probably not what I'm looking for since it gave a  > milliseconds difference in query performance
> > 2) Checked on different versions of solr (9.6.1 and 8.7.0) - no
> > significant changes
> > 3) Tried changing the field type to string - zero performance changes
> > 4) In both cases I see successful lookups in queryResultCache
> > 5) Enabling documentCache solves the problem in this case (rows<=50),
> > but introduces many other performance issues so it doesn't seem like a
> > viable option.