I think Mike mentioned docValues because docValues can save large amounts of heap memory for some use patterns. I was following up specifically wrt: 1. given the extra info you've provided, you don't appear to actually _have_ any of the use cases that would benefit from docValues, and 2. only the `sort` use case would benefit from SortableTextField, in which case you'd be sorting on docValues containing the entire input string, untokenized (which, given the example content you've provided, seems unlikely to be what you want to do).
My first suggestion (most likely to yield results) is: disable swap (i.e., `swapoff -a`). I have no experience with hdfs in particular, but the impact of disabling swap would be magnified in the unlikely event that your swap partition is on a network filesystem (that's why I asked about network fs, I should have been more specific; and that would have nothing to do with the filesystem on which the index itself is stored). My other suggestion _might_ help, but should more be viewed as trying to rule out extraneous inefficiencies: If you're initially evaluating performance of straight-conjunction (all required/"AND" clauses), you could rewrite `q=ptokens:8974561 AND ptokens:9844554 AND ptokens:8564484 AND ptokens:9846541` as `q=*:*&fq=ptokens:8974561&fq=ptokens:9844554&fq=ptokens:8564484&fq=ptokens:9846541`. If this is faster, that would indicate scoring and/or possibly the introduction of an undesired, implicit positional query. Appending the param `debug=query` should show you what queries are actually being generated, which could be helpful. But to reiterate: I would recommend first disabling swap and see where that gets you. . On Thu, Jul 22, 2021 at 1:53 PM Jon Morisi <[email protected]> wrote: > RE Shawn and Michael, > I am just looking for a way to speed it up. Mike Drob had mentioned > docvalues, which is why I was researching that route. > > I am running my search tests from solr admin, no facets, no sorting. I am > using Dsolr.directoryFactory=HdfsDirectoryFactory > > URL: > . /select?q=ptokens:8974561 AND ptokens:9844554 AND ptokens:8564484 AND > ptokens:9846541&echoParams=all > > Response once it ran (timeout on first attempt, waited 5min for re-try): > responseHeader > zkConnected true > status 0 > QTime 2411 > params > q "ptokens:243796009 AND ptokens:410512000 AND ptokens:410604004 AND > ptokens:408729009" > df "data" > rows "10" > echoParams "all" > > dashboard info: > System 0.16 0.13 0.14 > > Physical Memory 97.7% > 377.39 GB > 368.77 GB > > Swap Space 4.7% > 4.00 GB > 193.25 MB > > File Descriptor Count 0.2% > 128000 > 226 > > JVM-Memory 22.7% > 15.33 GB > 15.33 GB > > Thanks for looking, > Jon > > > -----Original Message----- > From: Shawn Heisey <[email protected]> > Sent: Thursday, July 22, 2021 11:26 AM > To: [email protected] > Subject: Re: Solr nodes crashing > > On 7/22/2021 10:39 AM, Michael Gibney wrote: > > ps- wrt requesting a "literal, complete search url" to aid > troubleshooting: > > facets, `sort`, `offset`, and `rows` params would all be of particular > > interest. > > > One way to get everything we are after for the query is to add > "echoParams=all" to the query URL and then include the full > "responseHeader" part of the response. That will even include parameters > defined in solrconfig.xml. > > |"responseHeader":{ "status":0, "QTime":55, "params":{ "q":"*:*", > "df":"_text_", "rows":"10", "echoParams":"all", "_":"1626974542182"}}, > Thanks, Shawn | > >
