Does 50K batch size is what are you using to ingest into Solr? If that's the case it may be too high and you may want to start with 100-1000 batch size depending on your document size and gradually increase until it starts degrading the performance.
On Wed, Jun 7, 2017 at 5:51 AM, Isart Montane <[email protected]> wrote: > Hi, > > The cluster is running on EC2 using 5x r3.xlarge instances and disks are > 1TB gp2 EBS. > > I will try to get the logs that Susheel requested but it's not an easy > task. > > When indexing there's very few IO. > > Solr is started with the following flags: > ``` > /usr/lib/jvm/java-8-oracle/bin/java > -server > -Xms15340m > -Xmx15340m > -XX:NewRatio=3 > -XX:SurvivorRatio=4 > -XX:TargetSurvivorRatio=90 > -XX:MaxTenuringThreshold=8 > -XX:+UseConcMarkSweepGC > -XX:+UseParNewGC > -XX:ConcGCThreads=4 > -XX:ParallelGCThreads=4 > -XX:+CMSScavengeBeforeRemark > -XX:PretenureSizeThreshold=64m > -XX:+UseCMSInitiatingOccupancyOnly > -XX:CMSInitiatingOccupancyFraction=50 > -XX:CMSMaxAbortablePrecleanTime=6000 > -XX:+CMSParallelRemarkEnabled > -XX:+ParallelRefProcEnabled > -XX:CompressedClassSpaceSize=250m > -verbose:gc > -XX:+PrintHeapAtGC > -XX:+PrintGCDetails > -XX:+PrintGCDateStamps > -XX:+PrintGCTimeStamps > -XX:+PrintTenuringDistribution > -XX:+PrintGCApplicationStoppedTime > -Xloggc:/data/solr/logs/solr_gc.log > -XX:+UseGCLogFileRotation > -XX:NumberOfGCLogFiles=9 > -XX:GCLogFileSize=20M > -Dcom.sun.management.jmxremote > -Dcom.sun.management.jmxremote.local.only=false > -Dcom.sun.management.jmxremote.ssl=false > -Dcom.sun.management.jmxremote.authenticate=false > -Dcom.sun.management.jmxremote.port=18983 > -Dcom.sun.management.jmxremote.rmi.port=18983 > -DzkClientTimeout=15000 > -DzkHost=zk1,zk2,zk3 > -Dsolr.log.dir=/data/solr/logs > -Djetty.port=8983 > -DSTOP.PORT=7983 > -DSTOP.KEY=solrrocks > -Duser.timezone=UTC > -Djetty.home=/home/solr/solr/server > -Dsolr.solr.home=/data/solr/data > -Dsolr.install.dir=/home/solr/solr > -Dlog4j.configuration=file:/data/solr/log4j.properties > -Xss256k > -Dsolr.log.muteconsole > -XX:OnOutOfMemoryError=/home/solr/solr/bin/oom_solr.sh 8983 > /data/solr/logs > -jar start.jar > --module=http > ``` > > Not sure if it's related, but when the batches get replicated on the > replicas they don't seem to respect the batch size on the primary. > > That's the insert on the primary (batch is 50k) > ``` > 2017-06-07 09:46:00.629 INFO (qtp592179046-260) [c:collection1 s:shard1 > r:core_node17 x:collection1_shard1_replica6] o.a.s.h.d.DocBuilder Import > completed successfully > 2017-06-07 09:46:00.638 INFO (qtp592179046-260) [c:collection1 s:shard1 > r:core_node17 x:collection1_shard1_replica6] o.a.s.h.d.DocBuilder Time > taken = 0:0:27.717 > 2017-06-07 09:46:00.655 INFO (qtp592179046-260) [c:collection1 s:shard1 > r:core_node17 x:collection1_shard1_replica6] > o.a.s.u.p.LogUpdateProcessorFactory [collection1_shard1_replica6] > webapp=/solr path=/dataimport > params={optimize=false&startId=489247153&synchronous= > true&limit=50000&commit=false&clean=false&command=full- > import&entity=instagram-users-incremental}{add=[489247178, > 489247179, 489247191, 489247238, 489247256, 489247260, 489247279, > 489247325, 489247368, 489247369, ... (50000 adds)]} 0 27743 > ``` > > And that get's replicated to the replicas as many rows like that. I thought > that replicating batches of 50k rows to batches of 10-20 rows might be a > problem, but I couldn't find a way to tune that behaviour (and I'm not sure > there's one) > ``` > 2017-06-07 09:45:23.640 INFO (qtp592179046-1808) [c:collection1 s:shard1 > r:core_node13 x:collection1_shard1_replica3] > o.a.s.u.p.LogUpdateProcessorFactory [collection1_shard1_replica3] > webapp=/solr path=/update params={update.distrib=FROMLEADER&distrib.from= > http://10.0.0.159:8983/solr/collection1_shard1_replica2/& > wt=javabin&version=2}{add=[488928327 > (1569538675702759424), 488928344 (1569538675703808000), 488928391 > (1569538675703808001), 488928406 (1569538675704856576), 488928418 > (1569538675704856577), 488928451 (1569538675705905152), 488928456 > (1569538675706953728), 488928495 (1569538675706953729), 488928538 > (1569538675708002304), 488928548 (1569538675708002305), ... (15 adds)]} 0 1 > 2017-06-07 09:45:23.671 INFO (qtp592179046-1832) [c:collection1 s:shard2 > r:core_node14 x:collection1_shard2_replica2] > o.a.s.u.p.LogUpdateProcessorFactory [collection1_shard2_replica2] > webapp=/solr path=/update params={update.distrib=FROMLEADER&distrib.from= > http://10.0.0.159:8983/solr/collection1_shard2_replica1/& > wt=javabin&version=2}{add=[488928306 > (1569538675703808000), 488928329 (1569538675717439488), 488928331 > (1569538675718488064), 488928332 (1569538675719536640), 488928378 > (1569538675734216704), 488928383 (1569538675735265280), 488928399 > (1569538675735265281), 488928426 (1569538675736313856), 488928438 > (1569538675742605312), 488928471 (1569538675743653888), ... (13 adds)]} 0 1 > 2017-06-07 09:45:23.686 INFO (qtp592179046-1811) [c:collection1 s:shard1 > r:core_node13 x:collection1_shard1_replica3] > o.a.s.u.p.LogUpdateProcessorFactory [collection1_shard1_replica3] > webapp=/solr path=/update params={update.distrib=FROMLEADER&distrib.from= > http://10.0.0.159:8983/solr/collection1_shard1_replica2/& > wt=javabin&version=2}{add=[488928827 > (1569538675750993920), 488928833 (1569538675753091072), 488928842 > (1569538675754139648), 488928888 (1569538675754139649), 488928914 > (1569538675755188224), 488928953 (1569538675755188225), 488928958 > (1569538675756236800), 488928969 (1569538675756236801), 488928977 > (1569538675757285376), 488928996 (1569538675757285377), ... (15 adds)]} 0 1 > 2017-06-07 09:45:23.706 INFO (qtp592179046-1861) [c:collection1 s:shard2 > r:core_node14 x:collection1_shard2_replica2] > o.a.s.u.p.LogUpdateProcessorFactory [collection1_shard2_replica2] > webapp=/solr path=/update params={update.distrib=FROMLEADER&distrib.from= > http://10.0.0.159:8983/solr/collection1_shard2_replica1/& > wt=javabin&version=2}{add=[488929020 > (1569538675758333952), 488929023 (1569538675761479680), 488929027 > (1569538675762528256), 488929032 (1569538675763576832), 488929035 > (1569538675763576833), 488929046 (1569538675764625408), 488929051 > (1569538675764625409), 488929131 (1569538675765673984), 488929141 > (1569538675766722560), 488929145 (1569538675766722561), ... (21 adds)]} 0 > 18 > 2017-06-07 09:45:25.213 INFO (qtp592179046-1845) [c:collection1 s:shard2 > r:core_node14 x:collection1_shard2_replica2] > o.a.s.u.p.LogUpdateProcessorFactory [collection1_shard2_replica2] > webapp=/solr path=/update params={update.distrib=FROMLEADER&distrib.from= > http://10.0.0.159:8983/solr/collection1_shard2_replica1/& > wt=javabin&version=2}{add=[488929441 > (1569538675793985536), 488929535 (1569538675795034112), 488929540 > (1569538675795034113), 488929560 (1569538675796082688), 488929597 > (1569538675796082689), 488929639 (1569538677352169472), 488929684 > (1569538677352169473), 488929713 (1569538677353218048), 488929777 > (1569538677353218049), 488929791 (1569538677353218050), ... (19 adds)]} 0 > 9``` > > On Wed, Jun 7, 2017 at 10:00 AM, Toke Eskildsen <[email protected]> wrote: > > > On Tue, 2017-06-06 at 10:51 +0200, Isart Montane wrote: > > > We are using SolrCloud with 5 nodes, 2 collections, 2 shards each. > > > The problem we are seeing is a huge drop on writes when the number of > > > replicas increase. > > > > > > When we index (using DIH and batches) a collection with no replicas, > > > we are able to index at 1800 inserts/sec. That number decreases to > > > 1200 with 1 replica, 800 with 2 replicas and 400 with 3 replicas and > > > it keeps getting worst when more replicas are added. > > > > That is, as Susheel says, not expected behaviour. If you are running > > everything on a single physical machine that could be an explanation. > > What is your hardware-setup? > > -- > > Toke Eskildsen, Royal Danish Library > > >
