I am asking point 1 , because in some cases I could see a line in logfile around 2 MB.So i need to know what mamimum event size.How to measure it?
On 16 July 2014 10:18, SaravanaKumar TR <saran0081...@gmail.com> wrote: > Hi Natty, > > Please help me to get the answers for the below queries. > > 1,In case of exec source , (tail -F <logfile>) , is that each line in file > is considered to be a single event ? > If suppose a line is considered to be a event , what is that maximum size > of event supported by flume?I mean maximum characters in a line supported? > 2.When event stop processing , I am not seeing "tail -F" command running > in the background. > I have used option like "a1.sources.r1.restart = true > a1.sources.r1.logStdErr = true".. > Does these config will not send any errors to flume.log if any issues in > tail? > Will this config doesnt try to restart the "tail -F" if its not running in > the background. > > 3.Does flume supports all formats of data in logfile or it has any > predefined data formats.. > > Please help me with these to understand better.. > > > > On 16 July 2014 00:56, Jonathan Natkins <na...@streamsets.com> wrote: > >> Saravana, >> >> Everything here looks pretty sane. Do you have a record of the events >> that came in leading up to the agent stopping collection? If you can >> provide the last file created by the agent, and ideally whatever events had >> come in, but not been written out to your HDFS sink, it might be possible >> for me to reproduce this issue. Would it be possible to get some sample >> data from you? >> >> Thanks, >> Natty >> >> >> On Tue, Jul 15, 2014 at 10:26 AM, SaravanaKumar TR < >> saran0081...@gmail.com> wrote: >> >>> Hi Natty , >>> >>> Just to understand , at present my settings is as >>> "flume.root.logger=INFO,LOGFILE" >>> in log4j.properties , do you want me to change it to >>> "flume.root.logger=DEBUG,LOGFILE" and restart the agent. >>> >>> But when I start agent , I am already starting with below command.I >>> guess i am using DEBUG already but not in config file , while starting >>> agent. >>> >>> ../bin/flume-ng agent -c /d0/flume/conf -f >>> /d0/flume/conf/flume-conf.properties -n a1 -Dflume.root.logger=DEBUG,LOGFILE >>> >>> If I do some changes in config "flume-conf.properties" or restart the >>> agent , it works again and starts collecting the data. >>> >>> currently all my logs move to flume.log , I dont see any exception . >>> >>> cat flume.log | grep "Exception" doesnt show any. >>> >>> >>> On 15 July 2014 22:24, Jonathan Natkins <na...@streamsets.com> wrote: >>> >>>> Hi Saravana, >>>> >>>> Our best bet on figuring out what's going on here may be to turn on the >>>> debug logging. What I would recommend is stopping your agents, and >>>> modifying the log4j properties to turn on DEBUG logging for the root >>>> logger, and then restart the agents. Once the agent stops producing new >>>> events, send out the logs and I'll be happy to take a look over them. >>>> >>>> Does the system begin working again if you restart the agents? Have you >>>> noticed any other events correlated with the agent stopping collecting >>>> events? Maybe a spike in events or something like that? And for my own >>>> peace of mind, if you run `cat /var/log/flume-ng/* | grep "Exception"`, >>>> does it bring anything back? >>>> >>>> Thanks! >>>> Natty >>>> >>>> >>>> On Tue, Jul 15, 2014 at 2:55 AM, SaravanaKumar TR < >>>> saran0081...@gmail.com> wrote: >>>> >>>>> Hi Natty, >>>>> >>>>> This is my entire config file. >>>>> >>>>> # Name the components on this agent >>>>> a1.sources = r1 >>>>> a1.sinks = k1 >>>>> a1.channels = c1 >>>>> >>>>> # Describe/configure the source >>>>> a1.sources.r1.type = exec >>>>> a1.sources.r1.command = tail -F /data/logs/test_log >>>>> a1.sources.r1.restart = true >>>>> a1.sources.r1.logStdErr = true >>>>> >>>>> #a1.sources.r1.batchSize = 2 >>>>> >>>>> a1.sources.r1.interceptors = i1 >>>>> a1.sources.r1.interceptors.i1.type = regex_filter >>>>> a1.sources.r1.interceptors.i1.regex = resuming normal >>>>> operations|Received|Response >>>>> >>>>> #a1.sources.r1.interceptors = i2 >>>>> #a1.sources.r1.interceptors.i2.type = timestamp >>>>> #a1.sources.r1.interceptors.i2.preserveExisting = true >>>>> >>>>> # Describe the sink >>>>> a1.sinks.k1.type = hdfs >>>>> a1.sinks.k1.hdfs.path = hdfs:// >>>>> testing.sck.com:9000/running/test.sck/date=%Y-%m-%d >>>>> a1.sinks.k1.hdfs.writeFormat = Text >>>>> a1.sinks.k1.hdfs.fileType = DataStream >>>>> a1.sinks.k1.hdfs.filePrefix = events- >>>>> a1.sinks.k1.hdfs.rollInterval = 600 >>>>> ##need to run hive query randomly to check teh long running process , >>>>> so we need to commit events in hdfs files regularly >>>>> a1.sinks.k1.hdfs.rollCount = 0 >>>>> a1.sinks.k1.hdfs.batchSize = 10 >>>>> a1.sinks.k1.hdfs.rollSize = 0 >>>>> a1.sinks.k1.hdfs.useLocalTimeStamp = true >>>>> >>>>> # Use a channel which buffers events in memory >>>>> a1.channels.c1.type = memory >>>>> a1.channels.c1.capacity = 10000 >>>>> a1.channels.c1.transactionCapacity = 10000 >>>>> >>>>> # Bind the source and sink to the channel >>>>> a1.sources.r1.channels = c1 >>>>> a1.sinks.k1.channel = c1 >>>>> >>>>> >>>>> On 14 July 2014 22:54, Jonathan Natkins <na...@streamsets.com> wrote: >>>>> >>>>>> Hi Saravana, >>>>>> >>>>>> What does your sink configuration look like? >>>>>> >>>>>> Thanks, >>>>>> Natty >>>>>> >>>>>> >>>>>> On Fri, Jul 11, 2014 at 11:05 PM, SaravanaKumar TR < >>>>>> saran0081...@gmail.com> wrote: >>>>>> >>>>>>> Assuming each line in the logfile is considered as a event for flume >>>>>>> , >>>>>>> >>>>>>> 1.Do we have any maximum size of event defined for memory/file >>>>>>> channel.like any maximum no of characters in a line. >>>>>>> 2.Does flume supports all formats of data to be processed as events >>>>>>> or do we have any limitation. >>>>>>> >>>>>>> I am just still trying to understanding why the flume stops >>>>>>> processing events after sometime. >>>>>>> >>>>>>> Can someone please help me out here. >>>>>>> >>>>>>> Thanks, >>>>>>> saravana >>>>>>> >>>>>>> >>>>>>> On 11 July 2014 17:49, SaravanaKumar TR <saran0081...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi , >>>>>>>> >>>>>>>> I am new to flume and using Apache Flume 1.5.0. Quick setup >>>>>>>> explanation here. >>>>>>>> >>>>>>>> Source:exec , tail –F command for a logfile. >>>>>>>> >>>>>>>> Channel: tried with both Memory & file channel >>>>>>>> >>>>>>>> Sink: HDFS >>>>>>>> >>>>>>>> When flume starts , processing events happens properly and its >>>>>>>> moved to hdfs without any issues. >>>>>>>> >>>>>>>> But after sometime flume suddenly stops sending events to HDFS. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I am not seeing any errors in logfile flume.log as well.Please let >>>>>>>> me know if I am missing any configuration here. >>>>>>>> >>>>>>>> >>>>>>>> Below is the channel configuration defined and I left the remaining >>>>>>>> to be default values. >>>>>>>> >>>>>>>> >>>>>>>> a1.channels.c1.type = FILE >>>>>>>> >>>>>>>> a1.channels.c1.transactionCapacity = 100000 >>>>>>>> >>>>>>>> a1.channels.c1.capacity = 10000000 >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Saravana >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >