Does 50K batch size is what are you using to ingest into Solr?  If that's
the case it may be too high and you may want to start with 100-1000 batch
size depending on your document size and gradually increase until it starts
degrading the performance.

On Wed, Jun 7, 2017 at 5:51 AM, Isart Montane <[email protected]>
wrote:

> Hi,
>
> The cluster is running on EC2 using 5x r3.xlarge instances and disks are
> 1TB gp2 EBS.
>
> I will try to get the logs that Susheel requested but it's not an easy
> task.
>
> When indexing there's very few IO.
>
> Solr is started with the following flags:
> ```
> /usr/lib/jvm/java-8-oracle/bin/java
>   -server
>   -Xms15340m
>   -Xmx15340m
>   -XX:NewRatio=3
>   -XX:SurvivorRatio=4
>   -XX:TargetSurvivorRatio=90
>   -XX:MaxTenuringThreshold=8
>   -XX:+UseConcMarkSweepGC
>   -XX:+UseParNewGC
>   -XX:ConcGCThreads=4
>   -XX:ParallelGCThreads=4
>   -XX:+CMSScavengeBeforeRemark
>   -XX:PretenureSizeThreshold=64m
>   -XX:+UseCMSInitiatingOccupancyOnly
>   -XX:CMSInitiatingOccupancyFraction=50
>   -XX:CMSMaxAbortablePrecleanTime=6000
>   -XX:+CMSParallelRemarkEnabled
>   -XX:+ParallelRefProcEnabled
>   -XX:CompressedClassSpaceSize=250m
>   -verbose:gc
>   -XX:+PrintHeapAtGC
>   -XX:+PrintGCDetails
>   -XX:+PrintGCDateStamps
>   -XX:+PrintGCTimeStamps
>   -XX:+PrintTenuringDistribution
>   -XX:+PrintGCApplicationStoppedTime
>   -Xloggc:/data/solr/logs/solr_gc.log
>   -XX:+UseGCLogFileRotation
>   -XX:NumberOfGCLogFiles=9
>   -XX:GCLogFileSize=20M
>   -Dcom.sun.management.jmxremote
>   -Dcom.sun.management.jmxremote.local.only=false
>   -Dcom.sun.management.jmxremote.ssl=false
>   -Dcom.sun.management.jmxremote.authenticate=false
>   -Dcom.sun.management.jmxremote.port=18983
>   -Dcom.sun.management.jmxremote.rmi.port=18983
>   -DzkClientTimeout=15000
>   -DzkHost=zk1,zk2,zk3
>   -Dsolr.log.dir=/data/solr/logs
>   -Djetty.port=8983
>   -DSTOP.PORT=7983
>   -DSTOP.KEY=solrrocks
>   -Duser.timezone=UTC
>   -Djetty.home=/home/solr/solr/server
>   -Dsolr.solr.home=/data/solr/data
>   -Dsolr.install.dir=/home/solr/solr
>   -Dlog4j.configuration=file:/data/solr/log4j.properties
>   -Xss256k
>   -Dsolr.log.muteconsole
>   -XX:OnOutOfMemoryError=/home/solr/solr/bin/oom_solr.sh 8983
> /data/solr/logs
>   -jar start.jar
>   --module=http
> ```
>
> Not sure if it's related, but when the batches get replicated on the
> replicas they don't seem to respect the batch size on the primary.
>
> That's the insert on the primary (batch is 50k)
> ```
> 2017-06-07 09:46:00.629 INFO  (qtp592179046-260) [c:collection1 s:shard1
> r:core_node17 x:collection1_shard1_replica6] o.a.s.h.d.DocBuilder Import
> completed successfully
> 2017-06-07 09:46:00.638 INFO  (qtp592179046-260) [c:collection1 s:shard1
> r:core_node17 x:collection1_shard1_replica6] o.a.s.h.d.DocBuilder Time
> taken = 0:0:27.717
> 2017-06-07 09:46:00.655 INFO  (qtp592179046-260) [c:collection1 s:shard1
> r:core_node17 x:collection1_shard1_replica6]
> o.a.s.u.p.LogUpdateProcessorFactory [collection1_shard1_replica6]
>  webapp=/solr path=/dataimport
> params={optimize=false&startId=489247153&synchronous=
> true&limit=50000&commit=false&clean=false&command=full-
> import&entity=instagram-users-incremental}{add=[489247178,
> 489247179, 489247191, 489247238, 489247256, 489247260, 489247279,
> 489247325, 489247368, 489247369, ... (50000 adds)]} 0 27743
> ```
>
> And that get's replicated to the replicas as many rows like that. I thought
> that replicating batches of 50k rows to batches of 10-20 rows might be a
> problem, but I couldn't find a way to tune that behaviour (and I'm not sure
> there's one)
> ```
> 2017-06-07 09:45:23.640 INFO  (qtp592179046-1808) [c:collection1 s:shard1
> r:core_node13 x:collection1_shard1_replica3]
> o.a.s.u.p.LogUpdateProcessorFactory [collection1_shard1_replica3]
>  webapp=/solr path=/update params={update.distrib=FROMLEADER&distrib.from=
> http://10.0.0.159:8983/solr/collection1_shard1_replica2/&;
> wt=javabin&version=2}{add=[488928327
> (1569538675702759424), 488928344 (1569538675703808000), 488928391
> (1569538675703808001), 488928406 (1569538675704856576), 488928418
> (1569538675704856577), 488928451 (1569538675705905152), 488928456
> (1569538675706953728), 488928495 (1569538675706953729), 488928538
> (1569538675708002304), 488928548 (1569538675708002305), ... (15 adds)]} 0 1
> 2017-06-07 09:45:23.671 INFO  (qtp592179046-1832) [c:collection1 s:shard2
> r:core_node14 x:collection1_shard2_replica2]
> o.a.s.u.p.LogUpdateProcessorFactory [collection1_shard2_replica2]
>  webapp=/solr path=/update params={update.distrib=FROMLEADER&distrib.from=
> http://10.0.0.159:8983/solr/collection1_shard2_replica1/&;
> wt=javabin&version=2}{add=[488928306
> (1569538675703808000), 488928329 (1569538675717439488), 488928331
> (1569538675718488064), 488928332 (1569538675719536640), 488928378
> (1569538675734216704), 488928383 (1569538675735265280), 488928399
> (1569538675735265281), 488928426 (1569538675736313856), 488928438
> (1569538675742605312), 488928471 (1569538675743653888), ... (13 adds)]} 0 1
> 2017-06-07 09:45:23.686 INFO  (qtp592179046-1811) [c:collection1 s:shard1
> r:core_node13 x:collection1_shard1_replica3]
> o.a.s.u.p.LogUpdateProcessorFactory [collection1_shard1_replica3]
>  webapp=/solr path=/update params={update.distrib=FROMLEADER&distrib.from=
> http://10.0.0.159:8983/solr/collection1_shard1_replica2/&;
> wt=javabin&version=2}{add=[488928827
> (1569538675750993920), 488928833 (1569538675753091072), 488928842
> (1569538675754139648), 488928888 (1569538675754139649), 488928914
> (1569538675755188224), 488928953 (1569538675755188225), 488928958
> (1569538675756236800), 488928969 (1569538675756236801), 488928977
> (1569538675757285376), 488928996 (1569538675757285377), ... (15 adds)]} 0 1
> 2017-06-07 09:45:23.706 INFO  (qtp592179046-1861) [c:collection1 s:shard2
> r:core_node14 x:collection1_shard2_replica2]
> o.a.s.u.p.LogUpdateProcessorFactory [collection1_shard2_replica2]
>  webapp=/solr path=/update params={update.distrib=FROMLEADER&distrib.from=
> http://10.0.0.159:8983/solr/collection1_shard2_replica1/&;
> wt=javabin&version=2}{add=[488929020
> (1569538675758333952), 488929023 (1569538675761479680), 488929027
> (1569538675762528256), 488929032 (1569538675763576832), 488929035
> (1569538675763576833), 488929046 (1569538675764625408), 488929051
> (1569538675764625409), 488929131 (1569538675765673984), 488929141
> (1569538675766722560), 488929145 (1569538675766722561), ... (21 adds)]} 0
> 18
> 2017-06-07 09:45:25.213 INFO  (qtp592179046-1845) [c:collection1 s:shard2
> r:core_node14 x:collection1_shard2_replica2]
> o.a.s.u.p.LogUpdateProcessorFactory [collection1_shard2_replica2]
>  webapp=/solr path=/update params={update.distrib=FROMLEADER&distrib.from=
> http://10.0.0.159:8983/solr/collection1_shard2_replica1/&;
> wt=javabin&version=2}{add=[488929441
> (1569538675793985536), 488929535 (1569538675795034112), 488929540
> (1569538675795034113), 488929560 (1569538675796082688), 488929597
> (1569538675796082689), 488929639 (1569538677352169472), 488929684
> (1569538677352169473), 488929713 (1569538677353218048), 488929777
> (1569538677353218049), 488929791 (1569538677353218050), ... (19 adds)]} 0
> 9```
>
> On Wed, Jun 7, 2017 at 10:00 AM, Toke Eskildsen <[email protected]> wrote:
>
> > On Tue, 2017-06-06 at 10:51 +0200, Isart Montane wrote:
> > > We are using SolrCloud with 5 nodes, 2 collections, 2 shards each.
> > > The problem we are seeing is a huge drop on writes when the number of
> > > replicas increase.
> > >
> > > When we index (using DIH and batches) a collection with no replicas,
> > > we are able to index at 1800 inserts/sec. That number decreases to
> > > 1200 with 1 replica, 800 with 2 replicas and 400 with 3 replicas and
> > > it keeps getting worst when more replicas are added.
> >
> > That is, as Susheel says, not expected behaviour. If you are running
> > everything on a single physical machine that could be an explanation.
> > What is your hardware-setup?
> > --
> > Toke Eskildsen, Royal Danish Library
> >
>

Reply via email to