Thanks Joel for your answer. Actually what I need is to return scores from different collections and make some calculations on the scores to retrieve at the end people. Let me show you a more complex sample. This is really fast on all collections in the same servers but very slow once sharding takes into place.
select(select( rollup( innerJoin( select(search(person, q="((((SmartSearchS:"madrid [$CU] [$PRJ] [$REC] "~100)^4 OR (SmartSearchS:"madrid [$CU] [$PRJ] [$RECL] "~100)^3 OR (SmartSearchS:"madrid [$CU] [$PRJ] "~100)^2) OR (((SmartSearchS:(madrid*))) OR ((SmartSearchS:("madrid")))^3)) AND ((*:* -StatusSFD:("\*\*\*System Delete\*\*\*")) AND type_level:(parent)))", fl="PersonIDDoc,score", sort="PersonIDDoc desc,score desc", rows="2147483647"),PersonIDDoc,score as PersonScore), select(rollup(search(contactupdate, q="(((UpdateTextS:(cto)) AND IsAttachToPerson:(true)) AND (((AssignConfidentialS:(false) OR (AssignAuthorisedEmplIdsS:((cc)))))))", fl="PersonIDDoc,score", sort="PersonIDDoc desc,score desc", rows="2147483647"),over=PersonIDDoc,sum(score),sum(PersonIDDoc)),PersonIDDoc,add(0,0) as DocumetScore,sum(score) as ContactUpdateScore), on="PersonIDDoc"), over=PersonIDDoc,sum(PersonScore),sum(ContactUpdateScore),count(PersonIDDoc)),PersonIDDoc,sum(ContactUpdateScore) as ContactUpdateScore,sum(PersonScore) as PersonScoreTotal,count(PersonIDDoc) as TokenCount),PersonIDDoc,TokenCount,add(mult(PersonScoreTotal,0.6),mult(ContactUpdateScore,0.4)) as CalculatedScore) I know is a pretty complex query but takes about 3 seconds on single shard and basically explodes on sharding. Does it mean I can't achive this behauviour sharding collections? Regards On Wed, 10 May 2023 at 16:17, Joel Bernstein <joels...@gmail.com> wrote: > So the first thing I see is that you're doing a search using the select > handler, which is required to sort by score. So in this scenario you will > run into deep paging issues as you increase the number of rows. This will > effect both memory and performance. A search using the export handler will > improve throughput as you add shards, without any memory penalty, but it > doesn't support scoring. > > In this case 1000 rows is not that many docs, so I'm surprised the penalty > is so high. But you will definitely run into large memory and performance > penalties if you start pulling larger result sets. > > Can you describe the exact use case you need to accomplish? For example, do > you need to extract a large number of documents by joining streams of > scored data? Or can you display just the top N documents of the joined > streams? > > > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > > On Wed, May 10, 2023 at 6:47 AM Sergio García Maroto <marot...@gmail.com> > wrote: > > > Sure. Let's start by the simplest stream expression. > > This one only targets person collection. > > > > *Stream Expression:* > > search(person, q="((((SmartSearchS:"france [$CU] [$PRJ] [$REC] "~100)^4 > OR > > (SmartSearchS:"france [$CU] [$PRJ] [$RECL] "~100)^3 OR > > (SmartSearchS:"france [$CU] [$PRJ] "~100)^2) OR > (((SmartSearchS:(france*))) > > OR ((SmartSearchS:("france")))^3)) AND ((*:* -StatusSFD:("\*\*\*System > > Delete\*\*\*")) AND type_level:(parent)))", fl="PersonIDDoc,score", > > sort="score desc,PersonIDDoc desc", rows="1000") > > > > *Schema* > > <field name="PersonIDDoc" type="string" indexed="true" stored="true" > > docValues="true" /> > > > > *No sharding* > > *1 shard 45.38GB with *64,348,740 docs > > stream expresion time : 660 ms > > > > *S**harding* > > *2 shards 23GB each* > > stream expresion time : 4000 ms > > > > > > > > On Wed, 10 May 2023 at 04:45, Joel Bernstein <joels...@gmail.com> wrote: > > > > > Can you share the expressions? Then we can discuss where the sharding > > comes > > > into play. > > > > > > > > > Joel Bernstein > > > http://joelsolr.blogspot.com/ > > > > > > > > > On Tue, May 9, 2023 at 1:17 PM Sergio García Maroto < > marot...@gmail.com> > > > wrote: > > > > > > > Hi, > > > > > > > > I am working currently on implementing sharding on current Solr Cloud > > > > Cluster. > > > > Main idea is to be able to scale horizontally. > > > > > > > > At the moment, without sharding we have all collections sitting on > all > > > > servers. > > > > We have as well pretty heavy streaming expressions returning many > ids. > > > > Average of 300,000 ids to join. > > > > > > > > After doing sharding I see a huge increase on CPU and memory usage. > > > > Making queries way slower comparing sharding to not sharding. > > > > > > > > I guess that's expected bacuase the joins need to send data across > > > servers > > > > over network. > > > > > > > > Any thoughs on best practices here. I guess a possible approach is to > > > split > > > > shards in more. > > > > > > > > Regards > > > > Sergio > > > > > > > > > >