So the first thing I see is that you're doing a search using the select handler, which is required to sort by score. So in this scenario you will run into deep paging issues as you increase the number of rows. This will effect both memory and performance. A search using the export handler will improve throughput as you add shards, without any memory penalty, but it doesn't support scoring.
In this case 1000 rows is not that many docs, so I'm surprised the penalty is so high. But you will definitely run into large memory and performance penalties if you start pulling larger result sets. Can you describe the exact use case you need to accomplish? For example, do you need to extract a large number of documents by joining streams of scored data? Or can you display just the top N documents of the joined streams? Joel Bernstein http://joelsolr.blogspot.com/ On Wed, May 10, 2023 at 6:47 AM Sergio García Maroto <marot...@gmail.com> wrote: > Sure. Let's start by the simplest stream expression. > This one only targets person collection. > > *Stream Expression:* > search(person, q="((((SmartSearchS:"france [$CU] [$PRJ] [$REC] "~100)^4 OR > (SmartSearchS:"france [$CU] [$PRJ] [$RECL] "~100)^3 OR > (SmartSearchS:"france [$CU] [$PRJ] "~100)^2) OR (((SmartSearchS:(france*))) > OR ((SmartSearchS:("france")))^3)) AND ((*:* -StatusSFD:("\*\*\*System > Delete\*\*\*")) AND type_level:(parent)))", fl="PersonIDDoc,score", > sort="score desc,PersonIDDoc desc", rows="1000") > > *Schema* > <field name="PersonIDDoc" type="string" indexed="true" stored="true" > docValues="true" /> > > *No sharding* > *1 shard 45.38GB with *64,348,740 docs > stream expresion time : 660 ms > > *S**harding* > *2 shards 23GB each* > stream expresion time : 4000 ms > > > > On Wed, 10 May 2023 at 04:45, Joel Bernstein <joels...@gmail.com> wrote: > > > Can you share the expressions? Then we can discuss where the sharding > comes > > into play. > > > > > > Joel Bernstein > > http://joelsolr.blogspot.com/ > > > > > > On Tue, May 9, 2023 at 1:17 PM Sergio García Maroto <marot...@gmail.com> > > wrote: > > > > > Hi, > > > > > > I am working currently on implementing sharding on current Solr Cloud > > > Cluster. > > > Main idea is to be able to scale horizontally. > > > > > > At the moment, without sharding we have all collections sitting on all > > > servers. > > > We have as well pretty heavy streaming expressions returning many ids. > > > Average of 300,000 ids to join. > > > > > > After doing sharding I see a huge increase on CPU and memory usage. > > > Making queries way slower comparing sharding to not sharding. > > > > > > I guess that's expected bacuase the joins need to send data across > > servers > > > over network. > > > > > > Any thoughs on best practices here. I guess a possible approach is to > > split > > > shards in more. > > > > > > Regards > > > Sergio > > > > > >