Re: timeAllowed in Solr 9

2024-11-22 Thread Chris Hostetter
Skimming this thread, I don't see any mention of what seems like the most likely culprit to me: Solr 9.3's refactoring of the timeAllowed functionality hamstrung *MOST* of the code paths that timeAllowed is realy useful for (spellcheck, facets, synonym expansion, etc...) by disabling Exitabl

Re: timeAllowed in Solr 9

2024-11-11 Thread Dominic Humphries
Thanks for the detailed reply! I'm fairly sure we don't have any custom components - we literally just grab the tarball from apache.org and install it. We use chef to manage our installation, and the upgrade was literally just a change of the recipe to handle the version, so the resulting instance

Re: timeAllowed in Solr 9

2024-11-11 Thread Gus Heck
Clearly timeAllowed is a bandaid on a much bigger problem here. 3840 fuzzy phrase queries containing 8-10 terms each on the title field is clearly something in your query analysis/expansion gone awry... especially since you seem to be generating phrase slop queries that include your negated terms..

Re: timeAllowed in Solr 9

2024-11-11 Thread Dominic Humphries
In fact, the terms in the NOT seem to result in wildly different response times for no apparent reason: AND+NOT+(internship+OR+intern+OR+graduate) 23.966s AND+NOT+(internship+OR+intern+OR+welder) 3.368s AND+NOT+(internship+OR+welder+OR+graduate) 23.958s AND+NOT+(welder+OR+intern+OR+graduate) 24.46

Re: timeAllowed in Solr 9

2024-11-11 Thread Dominic Humphries
Hm. An update: Making the exact same request, but adding a single term (in this case, "welder") to the "and not" results in a massively smaller debug output, and the search is done in under half a second. I can't understand why such a massive difference from *adding* a term? "debug":{ "rawq

Re: timeAllowed in Solr 9

2024-11-08 Thread Dominic Humphries
It also apparently doesn't allow emails big enough for the debug output. Here's a link to a Google Doc with the output in: https://docs.google.com/document/d/1TUPE4Qkc-zjKGCJnn0_YVMOfaLgzlF2YCFF9sz4LNcQ/edit?usp=sharing I hope that works well enough, if not we'll have to work out some other option

Re: timeAllowed in Solr 9

2024-11-08 Thread Gus Heck
The mailing list usually strips out attachments. You'll need to paste it into the body of the email. On Fri, Nov 8, 2024 at 7:16 AM Dominic Humphries wrote: > Fair enough! See attached, if that doesn't work I'll send it inline... > > On Thu, 7 Nov 2024 at 18:40, Gus Heck wrote: > >> Yes, seeing

Re: timeAllowed in Solr 9

2024-11-08 Thread Dominic Humphries
Fair enough! See attached, if that doesn't work I'll send it inline... On Thu, 7 Nov 2024 at 18:40, Gus Heck wrote: > Yes, seeing the final expanded query may shed light on where the time is > going, so voluminous output is good. Feel free to anonymize any customer > names or sensitive informati

Re: Re: timeAllowed in Solr 9

2024-11-08 Thread Darren Foreman
I tried the exact same procedure with a solr:8.11.1 container - generally, the query response comes back within 2 seconds. When I reduce timeAllowed in steps, it doesn't have much impact - e.g. with both timeAllowed=1000 and timeAllowed=500, .debug.timing.process.query remains in the 1100-1400 reg

Re: Re: timeAllowed in Solr 9

2024-11-07 Thread Gus Heck
Did you also run this against 8.11 and get shorter times? It has always been possible for timeAllowed to overrun somewhat (the code has to periodically check if limits have been exceeded, and the work between checks isn't going to stop), but if how much it overruns has changed drastically we do wan

Re: timeAllowed in Solr 9

2024-11-07 Thread Dominic Humphries
Sure: "index":{ "numDocs":7349353, "maxDoc":7834951, "deletedDocs":485598, "segmentCount":31, "segmentsFileSizeInBytes":2727, "sizeInBytes":22066572844, "size":"20.55 GB" On Thu, 7 Nov 2024 at 13:27, Gus Heck wrote: > This is interest

Re: timeAllowed in Solr 9

2024-11-07 Thread Gus Heck
Yes, seeing the final expanded query may shed light on where the time is going, so voluminous output is good. Feel free to anonymize any customer names or sensitive information with "" or similar. On Thu, Nov 7, 2024 at 12:21 PM Dominic Humphries wrote: > Yes, sorry, not cloud, afaik it's single

Re: timeAllowed in Solr 9

2024-11-07 Thread Dominic Humphries
An update, I found the part of the query that's making everything so slow: the q param When we have "q":"(carroll_county OR Aldi OR Cashier OR Kohls) AND NOT (internship OR intern OR graduate)", the search is very slow, taking 20-something seconds When it's just "q":"(carroll_county O

Re: timeAllowed in Solr 9

2024-11-07 Thread Dominic Humphries
Yes, sorry, not cloud, afaik it's single-sharded. Same query with facet fields removed takes just as long to run. Adding the debug to the request generates a rather large amount of output, I believe due to synonyms - I can send them if it's useful, but it's rather a lot? On Thu, 7 Nov 2024 at 15:

RE: Re: timeAllowed in Solr 9

2024-11-07 Thread Darren Foreman
Hi all, I also have little java expertise, but I seem able to replicate the problem locally like this: 1. Downloaded a large CSV: the UK government's "price paid data" CSV for an entire year: http://prod.publicdata.landregistry.gov.uk.s3-website-eu-west-1.amazonaws.com/pp-2019.csv 2.

Re: timeAllowed in Solr 9

2024-11-07 Thread Gus Heck
Ok so that's 7M docs at 3k/doc... a relatively reasonable index (at least if the hardware is reasonable, and you say it did work on 8.11 so that's probably fine). By your reply I assume it's single sharded and not using cloud/zookeeper? The request you showed has a lot of facets on it. How much

Re: timeAllowed in Solr 9

2024-11-07 Thread Gus Heck
This is interesting, can you give us a feel for the size/structure of the index (# of documents, size of index, # of shards)? On Thu, Nov 7, 2024 at 7:52 AM Dominic Humphries wrote: > An update, I found the part of the query that's making everything so slow: > the q param > > When we have >

Re: timeAllowed in Solr 9

2024-11-06 Thread Dominic Humphries
I spoke too soon, I figured out how to get VisualVM talking to solr. Now I'm just not sure what to do with it - what sorts of things am I looking for? On Wed, 6 Nov 2024 at 16:40, Dominic Humphries wrote: > Unfortunately I don't know Java anywhere near well enough to know my way > around a profi

Re: timeAllowed in Solr 9

2024-11-06 Thread Dominic Humphries
Unfortunately I don't know Java anywhere near well enough to know my way around a profiler or jstack. I've confirmed JMX is enabled and I can telnet to the port, but VisualVM fails to connect and gives me no reason as to why. I can post the query and result if that's useful - it doesn't return any

Re: timeAllowed in Solr 9

2024-11-06 Thread Gus Heck
If you have access to a test instance where the problem can be reproduced, attaching a profiler would be one way. Another cruder method is to use jstack to dump all the threads. Another way to tackle this is to help us reproduce your problem. Can you share details about your query? Obviously, plea

Re: timeAllowed in Solr 9

2024-11-06 Thread Dominic Humphries
I've tried both timeAllowed and cpuAllowed and neither are restricting the amount of time the queries take to run. I have a test query that's reliably taking 20-30 seconds, if there's any useful debug params or such I can run to provide the information you want I'm happy to run them - I'm not sure

Re: timeAllowed in Solr 9

2024-11-06 Thread Gus Heck
There are unit tests that seem to suggest that timeAllowed still works, can you provide some more information about your use case? Particularly important is any information about where (what code) your queries are spending a lot of time in if you have it. On Wed, Nov 6, 2024 at 6:18 AM Dominic Hum