Hi, The Flume config seems OK for me, one minor thing: I'd suggest to try the memory channel, it can speed up the things a little bit. The morphline part might be a bottleneck, could you please share its config as well? Some sample input files might also be useful to be able to help with the debugging.
Beside these I'd recommend to try to profile it with a Java profiler (e.g. jvisualvm). Regards, Denes On Fri, Feb 17, 2017 at 12:00 AM Anatharaman, Srinatha (Contractor) < srinatha_ananthara...@comcast.com> wrote: Hi, I have large set of small files , each file is around 7 – 10 K in size Total I have 350K files with around 6 GB. I have changed my flume configuration with many options but whatever the config change Solr takes 2 sec for each file to ingest agent.sources = SpoolDirSrc agent.channels = FileChannel agent.sinks = SolrSink # Configure Source agent.sources.SpoolDirSrc.channels = fileChannel agent.sources.SpoolDirSrc.type = spooldir agent.sources.SpoolDirSrc.spoolDir = /app/home/solr/final agent.sources.SpoolDirSrc.basenameHeader = true #agent.sources.SpoolDirSrc.batchSize = 100000 agent.sources.SpoolDirSrc.fileHeader = true agent.sources.SpoolDirSrc.deserializer = org.apache.flume.sink.solr.morphline.BlobDeserializer$Builder # Use a channel that buffers events in memory agent.channels.FileChannel.type = file agent.channels.FileChannel.capacity = 1000 agent.channels.FileChannel.transactionCapacity = 1000 #agent.channels.FileChannel.transactionCapacity = 10000 # Configure Solr Sink agent.sinks.SolrSink.type = org.apache.flume.sink.solr.morphline.MorphlineSolrSink agent.sinks.SolrSink.morphlineFile = /etc/flume/conf/morphline.conf #agent.sinks.SolrSink.batchsize = 100000 #agent.sinks.SolrSink.batchDurationMillis = 5000 agent.sinks.SolrSink.channel = fileChannel agent.sinks.SolrSink.morphlineId = morphline1 agent.sinks.SolrSink.tika.config = tikaConfig.xml agent.sinks.SolrSink.rollCount = 0 agent.sinks.SolrSink.rollInterval = 0 agent.sinks.SolrSink.rollsize = 100000000 agent.sinks.SolrSink.idleTimeout = 0 agent.sinks.SolrSink.batchSize = 100000 agent.sinks.SolrSink.txnEventMax = 10000000 agent.sources.SpoolDirSrc.channels = FileChannel agent.sinks.SolrSink.channel = FileChannel My Collection is on 2 shards and 1 replication Kindly let me know how do I make this better Regards, ~Sri