Skimming this thread, I don't see any mention of what seems like the
most likely culprit to me:
Solr 9.3's refactoring of the timeAllowed functionality hamstrung *MOST*
of the code paths that timeAllowed is realy useful for (spellcheck, facets,
synonym
expansion, etc...) by disabling Exitabl
Thanks for the detailed reply!
I'm fairly sure we don't have any custom components - we literally just
grab the tarball from apache.org and install it. We use chef to manage our
installation, and the upgrade was literally just a change of the recipe to
handle the version, so the resulting instance
Clearly timeAllowed is a bandaid on a much bigger problem here. 3840 fuzzy
phrase queries containing 8-10 terms each on the title field is clearly
something in your query analysis/expansion gone awry... especially since
you seem to be generating phrase slop queries that include your negated
terms..
In fact, the terms in the NOT seem to result in wildly different response
times for no apparent reason:
AND+NOT+(internship+OR+intern+OR+graduate) 23.966s
AND+NOT+(internship+OR+intern+OR+welder) 3.368s
AND+NOT+(internship+OR+welder+OR+graduate) 23.958s
AND+NOT+(welder+OR+intern+OR+graduate) 24.46
Hm.
An update: Making the exact same request, but adding a single term (in this
case, "welder") to the "and not" results in a massively smaller debug
output, and the search is done in under half a second. I can't understand
why such a massive difference from *adding* a term?
"debug":{
"rawq
It also apparently doesn't allow emails big enough for the debug output.
Here's a link to a Google Doc with the output in:
https://docs.google.com/document/d/1TUPE4Qkc-zjKGCJnn0_YVMOfaLgzlF2YCFF9sz4LNcQ/edit?usp=sharing
I hope that works well enough, if not we'll have to work out some other
option
The mailing list usually strips out attachments. You'll need to paste it
into the body of the email.
On Fri, Nov 8, 2024 at 7:16 AM Dominic Humphries
wrote:
> Fair enough! See attached, if that doesn't work I'll send it inline...
>
> On Thu, 7 Nov 2024 at 18:40, Gus Heck wrote:
>
>> Yes, seeing
Fair enough! See attached, if that doesn't work I'll send it inline...
On Thu, 7 Nov 2024 at 18:40, Gus Heck wrote:
> Yes, seeing the final expanded query may shed light on where the time is
> going, so voluminous output is good. Feel free to anonymize any customer
> names or sensitive informati
I tried the exact same procedure with a solr:8.11.1 container - generally,
the query response comes back within 2 seconds. When I reduce timeAllowed in
steps, it doesn't have much impact - e.g. with both timeAllowed=1000 and
timeAllowed=500, .debug.timing.process.query remains in the 1100-1400
reg
Did you also run this against 8.11 and get shorter times? It has always
been possible for timeAllowed to overrun somewhat (the code has to
periodically check if limits have been exceeded, and the work between
checks isn't going to stop), but if how much it overruns has changed
drastically we do wan
Sure:
"index":{
"numDocs":7349353,
"maxDoc":7834951,
"deletedDocs":485598,
"segmentCount":31,
"segmentsFileSizeInBytes":2727,
"sizeInBytes":22066572844,
"size":"20.55 GB"
On Thu, 7 Nov 2024 at 13:27, Gus Heck wrote:
> This is interest
Yes, seeing the final expanded query may shed light on where the time is
going, so voluminous output is good. Feel free to anonymize any customer
names or sensitive information with "" or similar.
On Thu, Nov 7, 2024 at 12:21 PM Dominic Humphries
wrote:
> Yes, sorry, not cloud, afaik it's single
An update, I found the part of the query that's making everything so slow:
the q param
When we have
"q":"(carroll_county OR Aldi OR Cashier OR Kohls) AND NOT (internship
OR intern OR graduate)",
the search is very slow, taking 20-something seconds
When it's just
"q":"(carroll_county O
Yes, sorry, not cloud, afaik it's single-sharded.
Same query with facet fields removed takes just as long to run. Adding the
debug to the request generates a rather large amount of output, I believe
due to synonyms - I can send them if it's useful, but it's rather a lot?
On Thu, 7 Nov 2024 at 15:
Hi all,
I also have little java expertise, but I seem able to replicate the problem
locally like this:
1. Downloaded a large CSV: the UK government's "price paid data" CSV for
an entire year:
http://prod.publicdata.landregistry.gov.uk.s3-website-eu-west-1.amazonaws.com/pp-2019.csv
2.
Ok so that's 7M docs at 3k/doc... a relatively reasonable index (at least
if the hardware is reasonable, and you say it did work on 8.11 so that's
probably fine).
By your reply I assume it's single sharded and not using cloud/zookeeper?
The request you showed has a lot of facets on it. How much
This is interesting, can you give us a feel for the size/structure of the
index (# of documents, size of index, # of shards)?
On Thu, Nov 7, 2024 at 7:52 AM Dominic Humphries
wrote:
> An update, I found the part of the query that's making everything so slow:
> the q param
>
> When we have
>
I spoke too soon, I figured out how to get VisualVM talking to solr. Now
I'm just not sure what to do with it - what sorts of things am I looking
for?
On Wed, 6 Nov 2024 at 16:40, Dominic Humphries wrote:
> Unfortunately I don't know Java anywhere near well enough to know my way
> around a profi
Unfortunately I don't know Java anywhere near well enough to know my way
around a profiler or jstack. I've confirmed JMX is enabled and I can telnet
to the port, but VisualVM fails to connect and gives me no reason as to
why.
I can post the query and result if that's useful - it doesn't return any
If you have access to a test instance where the problem can be reproduced,
attaching a profiler would be one way. Another cruder method is to use
jstack to dump all the threads.
Another way to tackle this is to help us reproduce your problem. Can you
share details about your query? Obviously, plea
I've tried both timeAllowed and cpuAllowed and neither are restricting the
amount of time the queries take to run. I have a test query that's reliably
taking 20-30 seconds, if there's any useful debug params or such I can run
to provide the information you want I'm happy to run them - I'm not sure
There are unit tests that seem to suggest that timeAllowed still works, can
you provide some more information about your use case? Particularly
important is any information about where (what code) your queries are
spending a lot of time in if you have it.
On Wed, Nov 6, 2024 at 6:18 AM Dominic Hum
22 matches
Mail list logo