Re: streaming expressions - sharding memory usage

Sergio García Maroto Wed, 10 May 2023 03:47:45 -0700

Sure. Let's start by the simplest stream expression.
This one only targets person collection.


*Stream Expression:*
search(person, q="((((SmartSearchS:"france [$CU] [$PRJ] [$REC] "~100)^4 OR
(SmartSearchS:"france [$CU] [$PRJ] [$RECL] "~100)^3 OR
(SmartSearchS:"france [$CU] [$PRJ] "~100)^2) OR (((SmartSearchS:(france*)))
OR ((SmartSearchS:("france")))^3)) AND ((*:* -StatusSFD:("\*\*\*System
Delete\*\*\*")) AND type_level:(parent)))", fl="PersonIDDoc,score",
sort="score desc,PersonIDDoc desc", rows="1000")

*Schema*
<field name="PersonIDDoc" type="string" indexed="true" stored="true"
docValues="true" />

*No sharding*
*1 shard 45.38GB with *64,348,740 docs
stream expresion time : 660 ms

*S**harding*
*2 shards 23GB each*
stream expresion time : 4000 ms



On Wed, 10 May 2023 at 04:45, Joel Bernstein <joels...@gmail.com> wrote:

> Can you share the expressions? Then we can discuss where the sharding comes
> into play.
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Tue, May 9, 2023 at 1:17 PM Sergio García Maroto <marot...@gmail.com>
> wrote:
>
> > Hi,
> >
> > I am working currently on implementing sharding on current Solr Cloud
> > Cluster.
> > Main idea is to be able to scale horizontally.
> >
> > At the moment, without sharding we have all collections sitting on all
> > servers.
> > We have as well pretty heavy streaming expressions returning many ids.
> > Average of 300,000 ids to join.
> >
> > After  doing sharding I see a huge increase on CPU and memory usage.
> > Making queries way slower comparing sharding to not sharding.
> >
> > I guess that's  expected bacuase the joins need to send data across
> servers
> > over network.
> >
> > Any thoughs on best practices here. I guess a possible approach is to
> split
> > shards in more.
> >
> > Regards
> > Sergio
> >
>

Re: streaming expressions - sharding memory usage

Reply via email to