Morten Bøgeskov created SOLR-13337:
--------------------------------------
Summary: TermsComponent sharded and terms.sort=index performance
Key: SOLR-13337
URL: https://issues.apache.org/jira/browse/SOLR-13337
Project: Solr
Issue Type: Improvement
Security Level: Public (Default Security Level. Issues are Public)
Components: SearchComponents - other
Affects Versions: 7.7
Environment: Linux 64bit debian
20-node cluster
Reporter: Morten Bøgeskov
Attachments: terms-component-index-order-speedup.patch
When the TermsComponet distributes across all shards, all (terms.limit=-1) are
returned.
This ought not to be needed when using terms.sort=index.
When using terms.lower=a in small test base (400k entries) it took 8.5-11.5s to
do a
/terms?terms.fl=register&terms.sort=index&terms.lower=a I did not try it on
production data (10x)
I do get the reason for getting all terms when sorting by count, however when
sorting by index, no more than the terms.limit number rows is required from any
shard. Most likely some will get discarded due to presence in more than one
shard. Given no term.min/maxcount (which definetely throws a spanner in the
works).
I've attached what I think would do the trick.
I haven't actually tested the patch (it compiles, however some other files in
the checkout I have doesn't: ant compile, javac: "error: cannot find symbol")
Might be somewhat related issue (SOLR-2908). I didn't quite get the more subtle
information in it.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]