I think Mike mentioned docValues because docValues can save large amounts
of heap memory for some use patterns. I was following up specifically wrt:
1. given the extra info you've provided, you don't appear to actually
_have_ any of the use cases that would benefit from docValues, and
2. only the `sort` use case would benefit from SortableTextField, in which
case you'd be sorting on docValues containing the entire input string,
untokenized (which, given the example content you've provided, seems
unlikely to be what you want to do).

My first suggestion (most likely to yield results) is: disable swap (i.e.,
`swapoff -a`). I have no experience with hdfs in particular, but the impact
of disabling swap would be magnified in the unlikely event that your swap
partition is on a network filesystem (that's why I asked about network fs,
I should have been more specific; and that would have nothing to do with
the filesystem on which the index itself is stored).

My other suggestion _might_ help, but should more be viewed as trying to
rule out extraneous inefficiencies:

If you're initially evaluating performance of straight-conjunction (all
required/"AND" clauses), you could rewrite `q=ptokens:8974561 AND
ptokens:9844554 AND ptokens:8564484 AND ptokens:9846541` as
`q=*:*&fq=ptokens:8974561&fq=ptokens:9844554&fq=ptokens:8564484&fq=ptokens:9846541`.
If this is faster, that would indicate scoring and/or possibly the
introduction of an undesired, implicit positional query. Appending the
param `debug=query` should show you what queries are actually being
generated, which could be helpful.

But to reiterate: I would recommend first disabling swap and see where that
gets you.
.

On Thu, Jul 22, 2021 at 1:53 PM Jon Morisi <[email protected]> wrote:

> RE Shawn and Michael,
> I am just looking for a way to speed it up.  Mike Drob had mentioned
> docvalues, which is why I was researching that route.
>
> I am running my search tests from solr admin, no facets, no sorting.  I am
> using Dsolr.directoryFactory=HdfsDirectoryFactory
>
> URL:
> . /select?q=ptokens:8974561 AND ptokens:9844554 AND ptokens:8564484 AND
> ptokens:9846541&echoParams=all
>
> Response once it ran (timeout on first attempt, waited 5min for re-try):
> responseHeader
> zkConnected     true
> status  0
> QTime   2411
> params
> q       "ptokens:243796009 AND ptokens:410512000 AND ptokens:410604004 AND
> ptokens:408729009"
> df      "data"
> rows    "10"
> echoParams      "all"
>
> dashboard info:
> System 0.16 0.13 0.14
>
> Physical Memory 97.7%
> 377.39 GB
> 368.77 GB
>
> Swap Space 4.7%
> 4.00 GB
> 193.25 MB
>
> File Descriptor Count 0.2%
> 128000
> 226
>
> JVM-Memory 22.7%
> 15.33 GB
> 15.33 GB
>
> Thanks for looking,
> Jon
>
>
> -----Original Message-----
> From: Shawn Heisey <[email protected]>
> Sent: Thursday, July 22, 2021 11:26 AM
> To: [email protected]
> Subject: Re: Solr nodes crashing
>
> On 7/22/2021 10:39 AM, Michael Gibney wrote:
> > ps- wrt requesting a "literal, complete search url" to aid
> troubleshooting:
> > facets, `sort`, `offset`, and `rows` params would all be of particular
> > interest.
>
>
> One way to get everything we are after for the query is to add
> "echoParams=all" to the query URL and then include the full
> "responseHeader" part of the response.  That will even include parameters
> defined in solrconfig.xml.
>
> |"responseHeader":{ "status":0, "QTime":55, "params":{ "q":"*:*",
> "df":"_text_", "rows":"10", "echoParams":"all", "_":"1626974542182"}},
> Thanks, Shawn |
>
>

Reply via email to