Re: streaming expressions - sharding memory usage

Joel Bernstein Wed, 10 May 2023 07:18:19 -0700

So the first thing I see is that you're doing a search using the select
handler, which is required to sort by score. So in this scenario you will
run into deep paging issues as you increase the number of rows. This will
effect both memory and performance. A search using the export handler will
improve throughput as you add shards, without any memory penalty, but it
doesn't support scoring.


In this case 1000 rows is not that many docs, so I'm surprised the penalty
is so high. But you will definitely run into large memory and performance
penalties if you start pulling larger result sets.

Can you describe the exact use case you need to accomplish? For example, do
you need to extract a large number of documents by joining streams of
scored data? Or can you display just the top N documents of the joined
streams?




Joel Bernstein
http://joelsolr.blogspot.com/


On Wed, May 10, 2023 at 6:47 AM Sergio García Maroto <marot...@gmail.com>
wrote:

> Sure. Let's start by the simplest stream expression.
> This one only targets person collection.
>
> *Stream Expression:*
> search(person, q="((((SmartSearchS:"france [$CU] [$PRJ] [$REC] "~100)^4 OR
> (SmartSearchS:"france [$CU] [$PRJ] [$RECL] "~100)^3 OR
> (SmartSearchS:"france [$CU] [$PRJ] "~100)^2) OR (((SmartSearchS:(france*)))
> OR ((SmartSearchS:("france")))^3)) AND ((*:* -StatusSFD:("\*\*\*System
> Delete\*\*\*")) AND type_level:(parent)))", fl="PersonIDDoc,score",
> sort="score desc,PersonIDDoc desc", rows="1000")
>
> *Schema*
> <field name="PersonIDDoc" type="string" indexed="true" stored="true"
> docValues="true" />
>
> *No sharding*
> *1 shard 45.38GB with *64,348,740 docs
> stream expresion time : 660 ms
>
> *S**harding*
> *2 shards 23GB each*
> stream expresion time : 4000 ms
>
>
>
> On Wed, 10 May 2023 at 04:45, Joel Bernstein <joels...@gmail.com> wrote:
>
> > Can you share the expressions? Then we can discuss where the sharding
> comes
> > into play.
> >
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> >
> > On Tue, May 9, 2023 at 1:17 PM Sergio García Maroto <marot...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > I am working currently on implementing sharding on current Solr Cloud
> > > Cluster.
> > > Main idea is to be able to scale horizontally.
> > >
> > > At the moment, without sharding we have all collections sitting on all
> > > servers.
> > > We have as well pretty heavy streaming expressions returning many ids.
> > > Average of 300,000 ids to join.
> > >
> > > After  doing sharding I see a huge increase on CPU and memory usage.
> > > Making queries way slower comparing sharding to not sharding.
> > >
> > > I guess that's  expected bacuase the joins need to send data across
> > servers
> > > over network.
> > >
> > > Any thoughs on best practices here. I guess a possible approach is to
> > split
> > > shards in more.
> > >
> > > Regards
> > > Sergio
> > >
> >
>

Re: streaming expressions - sharding memory usage

Reply via email to