Re: Excessive resources consumption migrating from Solr 6.6.0 Master/Slave to SolrCloud 6.6.0 (dozen times more resources)

Scott Stults Mon, 28 Aug 2017 14:59:34 -0700

Dani,

It might be time to attach some instrumentation to one of your nodes.
Finding out which classes are occupying the memory will help narrow the
issue.


Are you using a lot of facets, grouping, or stats during your queries?
Also, when you were doing Master/Slave, was that on the same version of
Solr as you're using now in SolrCloud mode?


-Scott

On Mon, Aug 28, 2017 at 4:50 AM, Daniel Ortega <danielortegauf...@gmail.com>
wrote:

> Hi Scott,
>
> Yes, we think that our usage scenario falls into Index-Heavy/Query-Heavy
> too. We have tested with several values in softcommit/hardcommit values
> (from few seconds to minutes) with no appreciable improvements :(
>
> Thanks for your reply!
>
> - Daniel
>
> 2017-08-25 6:45 GMT+02:00 Scott Stults <sstu...@opensourceconnections.com
> >:
>
> > Hi Dani,
> >
> > It seems like your use case falls into the Index-Heavy / Query-Heavy
> > category, so you might try increasing your hard commit frequency to 15
> > seconds rather than 15 minutes:
> >
> > https://lucidworks.com/2013/08/23/understanding-
> > transaction-logs-softcommit-and-commit-in-sorlcloud/
> >
> >
> > -Scott
> >
> > On Thu, Aug 24, 2017 at 10:03 AM, Daniel Ortega <
> > danielortegauf...@gmail.com
> > > wrote:
> >
> > > Hi Scott,
> > >
> > > In our indexing service we are using that client too
> > > (org.apache.solr.client.solrj.impl.CloudSolrClient) :)
> > >
> > > This is out Update Request Processor chain configuration:
> > >
> > > <updateProcessor class="solr.processor.SignatureUpdateProcessorFactor
> y"
> > > name
> > > ="signature"> <bool name="enabled">true</bool> <str
> > name="signatureField">
> > > hash</str> <bool name="overwriteDupes">false</bool> <str name=
> > > "signatureClass">solr.processor.Lookup3Signature</str>
> > </updateProcessor>
> > > <
> > > updateRequestProcessorChain processor="signature" name="dedupe">
> > <processor
> > > class="solr.LogUpdateProcessorFactory" /> <processor class=
> > > "solr.RunUpdateProcessorFactory" /> </updateRequestProcessorChain>
> <!--
> > > de-duplication process explained in:
> > > https://cwiki.apache.org/confluence/display/solr/De-Duplication --> <
> > > requestHandler name="/update" class="solr.UpdateRequestHandler" > <lst
> > > name=
> > > "defaults"> <str name="update.chain">dedupe</str> </lst>
> > </requestHandler>
> > >
> > > Thanks for your reply :)
> > >
> > > - Dani
> > >
> > > 2017-08-24 14:49 GMT+02:00 Scott Stults <sstults@
> > opensourceconnections.com
> > > >:
> > >
> > > > Hi Daniel,
> > > >
> > > > SolrJ has a few client implementations to choose from:
> CloudSolrClient,
> > > > ConcurrentUpdateSolrClient, HttpSolrClient, LBHttpSolrClient. You
> said
> > > your
> > > > query service uses CloudSolrClient, but it would be good to verify
> > which
> > > > implementation your indexing service uses.
> > > >
> > > > One of the problems you might be having is with your deduplication
> > step.
> > > > Can you post your Update Request Processor Chain?
> > > >
> > > >
> > > > -Scott
> > > >
> > > >
> > > > On Wed, Aug 23, 2017 at 4:13 PM, Daniel Ortega <
> > > > danielortegauf...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Scott,
> > > > >
> > > > > - *Can you describe the process that queries the DB and sends
> records
> > > to
> > > > *
> > > > > *Solr?*
> > > > >
> > > > > We are enqueueing ids during every ORACLE transaction (in
> > > > insert/updates).
> > > > >
> > > > > An application dequeues every id and perform queries against dozen
> of
> > > > > tables in the relational model to retrieve the fields to build the
> > > > > document.  As we know that we are modifying the same ORACLE row in
> > > > > different (but consecutive) transactions, we store only the last
> > > version
> > > > of
> > > > > the modified documents in a map data structure.
> > > > >
> > > > > The application has a configurable interval to send the documents
> > > stored
> > > > in
> > > > > the map to the update handler (we have tested different intervals
> > from
> > > > few
> > > > > milliseconds to several seconds) using the SolrJ client. Actually
> we
> > > are
> > > > > sending all the documents every 15 seconds.
> > > > >
> > > > > This application is developed using Java, Spring and Maven and we
> > have
> > > > > several instances.
> > > > >
> > > > > -* Is it a SolrJ-based application?*
> > > > >
> > > > > Yes, it is. We aren't using the last version of SolrJ client (we
> are
> > > > > currently using SolrJ v6.3.0).
> > > > >
> > > > > - *If it is, which client package are you using?*
> > > > >
> > > > > I don't know exactly what do you mean saying 'client package' :)
> > > > >
> > > > > - *How many documents do you send at once?*
> > > > >
> > > > > It depends on the defined interval described before and the number
> of
> > > > > transactions executed in our relational database. From dozens to
> few
> > > > > hundreds (and even thousands).
> > > > >
> > > > > - *Are you sending your indexing or query traffic through a load
> > > > balancer?*
> > > > >
> > > > > We aren't using a load balancer for indexing, but we have all our
> > Rest
> > > > > Query services through an HAProxy (using 'leastconn' algorithm).
> The
> > > Rest
> > > > > Query Services performs queries using the CloudSolrClient.
> > > > >
> > > > > Thanks for your reply,
> > > > > if you need any further information don't hesitate to ask
> > > > >
> > > > > Daniel
> > > > >
> > > > > 2017-08-23 14:57 GMT+02:00 Scott Stults <sstults@
> > > > opensourceconnections.com
> > > > > >:
> > > > >
> > > > > > Hi Daniel,
> > > > > >
> > > > > > Great background information about your setup! I've got just a
> few
> > > more
> > > > > > questions:
> > > > > >
> > > > > > - Can you describe the process that queries the DB and sends
> > records
> > > to
> > > > > > Solr?
> > > > > > - Is it a SolrJ-based application?
> > > > > > - If it is, which client package are you using?
> > > > > > - How many documents do you send at once?
> > > > > > - Are you sending your indexing or query traffic through a load
> > > > balancer?
> > > > > >
> > > > > > If you're sending documents to each replica as fast as they can
> > take
> > > > > them,
> > > > > > you might be seeing a bottleneck at the shard leaders. The SolrJ
> > > > > > CloudSolrClient finds out from Zookeeper which nodes are the
> shard
> > > > > leaders
> > > > > > and sends docs directly to them.
> > > > > >
> > > > > >
> > > > > > -Scott
> > > > > >
> > > > > > On Tue, Aug 22, 2017 at 2:16 PM, Daniel Ortega <
> > > > > > danielortegauf...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > *Main Problems*
> > > > > > >
> > > > > > >
> > > > > > > We are involved in a migration from Solr Master/Slave
> > > infrastructure
> > > > to
> > > > > > > SolrCloud infrastructure.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > The main problems that we have now are:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >    - Excessive resources consumption: Currently we have 5
> > instances
> > > > > with
> > > > > > 80
> > > > > > >    processors/768 GB RAM each instance using SSD Hard Disk
> Drives
> > > > that
> > > > > > > doesn't
> > > > > > >    support the load that we have in the other architecture. In
> > our
> > > > > > >    Master-Slave architecture we have only 7 Virtual Machines
> with
> > > > lower
> > > > > > > specs
> > > > > > >    (4 processors and 16 GB each instance using SSD Hard Disk
> > Drives
> > > > > too).
> > > > > > > So,
> > > > > > >    at the moment our SolrCloud infrastructure is wasting
> several
> > > > dozen
> > > > > > > times
> > > > > > >    more resources than our Solr Master/Slave infrastructure.
> > > > > > >    - Despite spending more resources we have worst query times
> > > > > (compared
> > > > > > to
> > > > > > >    Solr in master/slave architecture)
> > > > > > >
> > > > > > >
> > > > > > > *Search infrastructure (SolrCloud infrastructure)*
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > As we cannot use DIH Handler (which is what we use in Solr
> > > > > Master/Slave),
> > > > > > > we
> > > > > > > have developed an application which reads every transaction
> from
> > > > > Oracle,
> > > > > > > builds a document collection searching in the database and
> sends
> > > the
> > > > > > result
> > > > > > > to the */update* handler every 200 milliseconds using SolrJ
> > client.
> > > > > This
> > > > > > > application tries to delete the possible duplicates in each
> > update
> > > > > > window,
> > > > > > > but we are using solr’s de-duplication techniques
> > > > > > > <https://emea01.safelinks.protection.outlook.com/?url=
> > > > > > > https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%
> > > > > > > 2Fsolr%2FDe-Duplication&data=02%7C01%7Cdortega%40idealista.com
> %
> > > > > > > 7Cb169ea024abc4954927208d4bc6868eb%
> > 7Cd78b7929c2a34897ae9a7d8f8dc1
> > > > > > > a1cf%7C0%7C0%7C636340604697721266&sdata=WEhzoHC1Bf77K706%
> > > > > > > 2Fj2wIWOw5gzfOgsP1IPQESvMsqQ%3D&reserved=0>
> > > > > > >  too.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > We are indexing ~100 documents per second (with peaks of ~1000
> > > > > documents
> > > > > > > per second).
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Every search query is centralized in other application which
> > > exposes
> > > > a
> > > > > > DSL
> > > > > > > behind a REST API and uses SolrJ client too to perform queries.
> > We
> > > > have
> > > > > > > peaks of 2000 QPS.
> > > > > > >
> > > > > > > *Cluster structure **(SolrCloud infrastructure)*
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > At the moment, the cluster has 30 SolrCloud instances with the
> > same
> > > > > specs
> > > > > > > (Same physical hosts, same JVM Settings, etc.).
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > *Main collection*
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > In our use case we are using this collection as a NoSQL
> database
> > > > > > basically.
> > > > > > > Our document is composed of about 300 fields that represents an
> > > > advert,
> > > > > > and
> > > > > > > is a denormalization of its relational representation in
> Oracle.
> > > > > > >
> > > > > > >
> > > > > > > We are using all our nodes to store the  collection in 3
> shards.
> > > So,
> > > > > each
> > > > > > > shard has 10 replicas.
> > > > > > >
> > > > > > >
> > > > > > > At the moment, we are only indexing a subset of the adverts
> > stored
> > > in
> > > > > > > Oracle, but our goal is to store all the ads that we have in
> the
> > DB
> > > > (a
> > > > > > few
> > > > > > > tens of millions of documents). We have NRT requirements, so we
> > > need
> > > > to
> > > > > > > index every document as soon as posible once it’s changed in
> > > Oracle.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > We have defined the properties of each field (if it’s
> > > stored/indexed
> > > > or
> > > > > > > not, if should be defined as DocValue, etc…) considering the
> use
> > of
> > > > > that
> > > > > > > field.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > *Index size **(SolrCloud infrastructure)*
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > The index size is currently above 6 GB, storing 1.300.000
> > documents
> > > > in
> > > > > > each
> > > > > > > shard. So, we are storing 3.900.000 documents and the total
> index
> > > > size
> > > > > is
> > > > > > > 18 GB.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > *Indexation **(SolrCloud infrastructure)*
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > The commits *aren’t* triggered by the application described
> > before.
> > > > The
> > > > > > > hardcommit/softcommit interval are configured in Solr:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >    - *HardCommit:* every 15 minutes (with opensearcher = false)
> > > > > > >    - *SoftCommit:* every 5 seconds
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > *Apache Solr Version*
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > We are currently using the last version of Solr (6.6.0) under
> an
> > > > Oracle
> > > > > > VM
> > > > > > > (Java(TM) SE Runtime Environment (build 1.8.0_131-b11) Oracle
> (64
> > > > > bits))
> > > > > > in
> > > > > > > both deployments.
> > > > > > >
> > > > > > >
> > > > > > > The question is... What is wrong here?!?!?!
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Scott Stults | Founder & Solutions Architect | OpenSource
> > > Connections,
> > > > > LLC
> > > > > > | 434.409.2780
> > > > > > http://www.opensourceconnections.com
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Scott Stults | Founder & Solutions Architect | OpenSource
> Connections,
> > > LLC
> > > > | 434.409.2780
> > > > http://www.opensourceconnections.com
> > > >
> > >
> >
> >
> >
> > --
> > Scott Stults | Founder & Solutions Architect | OpenSource Connections,
> LLC
> > | 434.409.2780
> > http://www.opensourceconnections.com
> >
>



-- 
Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC
| 434.409.2780
http://www.opensourceconnections.com

Re: Excessive resources consumption migrating from Solr 6.6.0 Master/Slave to SolrCloud 6.6.0 (dozen times more resources)

Reply via email to