Re: Resilient Producer

Lakshmanan Muthuraman Thu, 29 Jan 2015 10:19:23 -0800

Thanks David. This looks to be interesting. Will definitely test this out
to see whether this solves our problem.


On Thu, Jan 29, 2015 at 8:29 AM, David Morales <dmora...@stratio.com> wrote:

> Existing "tail" source is not the best choice in your scenario, as you have
> pointed out.
>
> SpoolDir could be a solution if your log file rotation policy is very low
> (5 minutes, for example), but then you have to deal with a huge number of
> files in the folder (slower listings).
>
> There is a proposal for a new approach, something that combines the best of
> "tail" and "spoolDir". Take a look here:
>
> https://issues.apache.org/jira/browse/FLUME-2498
>
>
>
>
> 2015-01-29 0:24 GMT+01:00 Lakshmanan Muthuraman <lakshma...@tokbox.com>:
>
> > We have been using Flume to solve a very similar usecase. Our servers
> write
> > the log files to a local file system, and then we have flume agent which
> > ships the data to kafka.
> >
> > Flume you can use as exec source running tail. Though the exec source
> runs
> > well with tail, there are issues if the agent goes down or the file
> channel
> > starts building up. If the agent goes down, you can request flume exec
> tail
> > source to go back n number of lines or read from beginning of the file.
> The
> > challenge is we roll our log files on a daily basis. What if goes down in
> > the evening. We need to go back to the entire days worth of data for
> > reprocessing which slows down the data flow. We can also go back
> arbitarily
> > number of lines, but then we dont know what is the right number to go
> back.
> > This is kind of challenge for us. We have tried spooling directory. Which
> > works, but we need to have a different log file rotation policy. We
> > considered evening going a file rotation for a minute, but it will  still
> > affect the real time data flow in our kafka--->storm-->Elastic search
> > pipeline with a minute delay.
> >
> > We are going to do a poc on logstash to see how this solves the problem
> of
> > flume.
> >
> > On Wed, Jan 28, 2015 at 10:39 AM, Fernando O. <fot...@gmail.com> wrote:
> >
> > > Hi all,
> > >     I'm evaluating using Kafka.
> > >
> > > I liked this thing of Facebook scribe that you log to your own machine
> > and
> > > then there's a separate process that forwards messages to the central
> > > logger.
> > >
> > > With Kafka it seems that I have to embed the publisher in my app, and
> > deal
> > > with any communication problem managing that on the producer side.
> > >
> > > I googled quite a bit trying to find a project that would basically use
> > > daemon that parses a log file and send the lines to the Kafka cluster
> > > (something like a tail file.log but instead of redirecting the output
> to
> > > the console: send it to kafka)
> > >
> > > Does anyone knows about something like that?
> > >
> > >
> > > Thanks!
> > > Fernando.
> > >
> >
>
>
>
> --
>
> David Morales de Frías  ::  +34 607 010 411 :: @dmoralesdf
> <https://twitter.com/dmoralesdf>
>
>
> <http://www.stratio.com/>
> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
> 28224 Pozuelo de Alarcón, Madrid
> Tel: +34 91 828 6473 // www.stratio.com // *@stratiobd
> <https://twitter.com/StratioBD>*
>

Re: Resilient Producer

Reply via email to