Thanks David. This looks to be interesting. Will definitely test this out to see whether this solves our problem.
On Thu, Jan 29, 2015 at 8:29 AM, David Morales <dmora...@stratio.com> wrote: > Existing "tail" source is not the best choice in your scenario, as you have > pointed out. > > SpoolDir could be a solution if your log file rotation policy is very low > (5 minutes, for example), but then you have to deal with a huge number of > files in the folder (slower listings). > > There is a proposal for a new approach, something that combines the best of > "tail" and "spoolDir". Take a look here: > > https://issues.apache.org/jira/browse/FLUME-2498 > > > > > 2015-01-29 0:24 GMT+01:00 Lakshmanan Muthuraman <lakshma...@tokbox.com>: > > > We have been using Flume to solve a very similar usecase. Our servers > write > > the log files to a local file system, and then we have flume agent which > > ships the data to kafka. > > > > Flume you can use as exec source running tail. Though the exec source > runs > > well with tail, there are issues if the agent goes down or the file > channel > > starts building up. If the agent goes down, you can request flume exec > tail > > source to go back n number of lines or read from beginning of the file. > The > > challenge is we roll our log files on a daily basis. What if goes down in > > the evening. We need to go back to the entire days worth of data for > > reprocessing which slows down the data flow. We can also go back > arbitarily > > number of lines, but then we dont know what is the right number to go > back. > > This is kind of challenge for us. We have tried spooling directory. Which > > works, but we need to have a different log file rotation policy. We > > considered evening going a file rotation for a minute, but it will still > > affect the real time data flow in our kafka--->storm-->Elastic search > > pipeline with a minute delay. > > > > We are going to do a poc on logstash to see how this solves the problem > of > > flume. > > > > On Wed, Jan 28, 2015 at 10:39 AM, Fernando O. <fot...@gmail.com> wrote: > > > > > Hi all, > > > I'm evaluating using Kafka. > > > > > > I liked this thing of Facebook scribe that you log to your own machine > > and > > > then there's a separate process that forwards messages to the central > > > logger. > > > > > > With Kafka it seems that I have to embed the publisher in my app, and > > deal > > > with any communication problem managing that on the producer side. > > > > > > I googled quite a bit trying to find a project that would basically use > > > daemon that parses a log file and send the lines to the Kafka cluster > > > (something like a tail file.log but instead of redirecting the output > to > > > the console: send it to kafka) > > > > > > Does anyone knows about something like that? > > > > > > > > > Thanks! > > > Fernando. > > > > > > > > > -- > > David Morales de Frías :: +34 607 010 411 :: @dmoralesdf > <https://twitter.com/dmoralesdf> > > > <http://www.stratio.com/> > Vía de las dos Castillas, 33, Ática 4, 3ª Planta > 28224 Pozuelo de Alarcón, Madrid > Tel: +34 91 828 6473 // www.stratio.com // *@stratiobd > <https://twitter.com/StratioBD>* >