On 7/22/2021 11:53 AM, Jon Morisi wrote:

RE Shawn and Michael,
I am just looking for a way to speed it up.  Mike Drob had mentioned docvalues, 
which is why I was researching that route.

I am running my search tests from solr admin, no facets, no sorting.  I am 
using Dsolr.directoryFactory=HdfsDirectoryFactory

Getting good caching with HDFS is something I am not sure how to do.  I would bet that you have to assign a whole bunch of memory to the Solr heap and then allocate a lot of that to the HDFS client for caching purposes.

You can take a look at this wiki page I wrote, but keep in mind that it is tailored for local disks, not HDFS:

https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems

Is there any way you can switch to local disks instead of HDFS? Solr tends to perform badly with indexes on the network instead of local.  What are you trying to achieve with your usage of HDFS?

URL:
. /select?q=ptokens:8974561 AND ptokens:9844554 AND ptokens:8564484 AND 
ptokens:9846541&echoParams=all

Response once it ran (timeout on first attempt, waited 5min for re-try):
responseHeader  
zkConnected     true
status  0
QTime   2411
params  
q       "ptokens:243796009 AND ptokens:410512000 AND ptokens:410604004 AND 
ptokens:408729009"
df      "data"
rows    "10"
echoParams      "all"

What is the field definition for ptokens and what is the fieldType definition for the type referenced in the field definition?  If this field is set up as a numeric Point type, you're running into a known limitation -- single-value lookups on Point fields are slow, and if the field cardinality is high, then make that VERY slow.  The workaround would be to switch to either a String type or a Trie type, and completely reindex.  Trie types are deprecated and will eventually be removed from Solr.  Or you could turn the query into a range query, and it would work much better -- Point types are EXCELLENT for range queries.

dashboard info:
System 0.16 0.13 0.14

Physical Memory 97.7%
377.39 GB
368.77 GB

Swap Space 4.7%
4.00 GB
193.25 MB

File Descriptor Count 0.2%
128000
226

JVM-Memory 22.7%
15.33 GB
15.33 GB

If disabling swap as Michael is suggesting DOES make performance better, I think that would be an indication of some very strange system level problems.  I don't expect it to change anything.

Thanks,
Shawn

Reply via email to