Hi Gwen, Well I have gone through this link while trying to setup my Logstash Kafka handler,
https://github.com/joekiller/logstash-kafka I could achieve what I was looking for but the performance is badly affected while trying to write a big file of GB's. I guess there should be some way so as to parallelise the existing running process. Thanks! On Sun, Feb 8, 2015 at 8:06 PM, Gwen Shapira <gshap...@cloudera.com> wrote: > I'm wondering how much of the time is spent by Logstash reading and > processing the log vs. time spent sending data to Kafka. Also, I'm not > familiar with log.stash internals, perhaps it can be tuned to send the data > to Kafka in larger batches? > > At the moment its difficult to tell where is the slowdown. More information > about the breakdown of time will help. > > You can try Flume's SpoolingDirectory source with Kafka Channel or Sink and > see if you get improved performance out of other tools. > > > Gwen > > On Sun, Feb 8, 2015 at 12:06 AM, Vineet Mishra <clearmido...@gmail.com> > wrote: > > > Hi All, > > > > I am having some log files of around 30GB, I am trying to event process > > these logs by pushing them to Kafka. I could clearly see the throughput > > achieved while publishing these event to Kafka is quiet slow. > > > > So as mentioned for the single log file of 30GB, the Logstash is > > continuously emitting to Kafka and it is running from more than 2 days > but > > still it has processed just 60% of the log data. I was looking out for a > > way to increase the efficiency of the publishing the event to kafka as > with > > this rate of data ingestion I don't think it will be a good option to > move > > ahead. > > > > Looking out for performance improvisation for the same. > > > > Experts advise required! > > > > Thanks! > > >