Re: Flume stops processing event after a while

Jonathan Natkins Wed, 16 Jul 2014 09:26:49 -0700

That's definitely your problem. 20MB is way too low for this. Depending on
the other processes you're running with your system, the amount of memory
you'll need will vary, but I'd recommend at least 1GB. You should define it
exactly where it's defined right now, so instead of the current command,
you can run:


"/cv/jvendor/bin/java -Xmx1g -Dflume.root.logger=DEBUG,LOGFILE......"


On Wed, Jul 16, 2014 at 3:03 AM, SaravanaKumar TR <saran0081...@gmail.com>
wrote:

> I guess i am using defaulk values , from running flume i could see these
> lines  "/cv/jvendor/bin/java -Xmx20m
> -Dflume.root.logger=DEBUG,LOGFILE......"
>
> so i guess it takes 20 mb as agent flume memory.
> My RAM is 128 GB.So please suggest how much can i assign as heap memory
> and where to define it.
>
>
> On 16 July 2014 15:05, Jonathan Natkins <na...@streamsets.com> wrote:
>
>> Hey Saravana,
>>
>> I'm attempting to reproduce this, but do you happen to know what the Java
>> heap size is for your Flume agent? This information leads me to believe
>> that you don't have enough memory allocated to the agent, which you may
>> need to do with the -Xmx parameter when you start up your agent. That
>> aside, you can set the byteCapacity parameter on the memory channel to
>> specify how much memory it is allowed to use. It should default to 80% of
>> the Java heap size, but if your heap is too small, this might be a cause of
>> errors.
>>
>> Does anything get written to the log when you try to pass in an event of
>> this size?
>>
>> Thanks,
>> Natty
>>
>>
>> On Wed, Jul 16, 2014 at 1:46 AM, SaravanaKumar TR <saran0081...@gmail.com
>> > wrote:
>>
>>> Hi Natty,
>>>
>>> While looking further , i could see memory channal stops if a line comes
>>> with greater than 2 MB.Let me know which parameter helps us to define max
>>> event size of about 3 MB.
>>>
>>>
>>> On 16 July 2014 12:46, SaravanaKumar TR <saran0081...@gmail.com> wrote:
>>>
>>>> I am asking point 1 , because in some cases  I could see a line in
>>>> logfile around 2 MB.So i need to know what mamimum event size.How to
>>>> measure it?
>>>>
>>>>
>>>>
>>>>
>>>> On 16 July 2014 10:18, SaravanaKumar TR <saran0081...@gmail.com> wrote:
>>>>
>>>>> Hi Natty,
>>>>>
>>>>> Please help me to get the answers for the below queries.
>>>>>
>>>>> 1,In case of exec source , (tail -F <logfile>) , is that each line in
>>>>> file is considered to be a single event ?
>>>>> If suppose a line is considered to be a event , what is that maximum
>>>>> size of event supported by flume?I mean maximum characters in a line
>>>>> supported?
>>>>> 2.When event stop processing , I am not seeing "tail -F" command
>>>>> running in the background.
>>>>> I have used option like "a1.sources.r1.restart = true
>>>>> a1.sources.r1.logStdErr = true"..
>>>>> Does these config will not send any errors to flume.log if any issues
>>>>> in tail?
>>>>> Will this config doesnt try to restart the "tail -F" if its not
>>>>> running in the background.
>>>>>
>>>>> 3.Does flume supports all formats of data in logfile or it has any
>>>>> predefined data formats..
>>>>>
>>>>> Please help me with these to understand better..
>>>>>
>>>>>
>>>>>
>>>>> On 16 July 2014 00:56, Jonathan Natkins <na...@streamsets.com> wrote:
>>>>>
>>>>>> Saravana,
>>>>>>
>>>>>> Everything here looks pretty sane. Do you have a record of the events
>>>>>> that came in leading up to the agent stopping collection? If you can
>>>>>> provide the last file created by the agent, and ideally whatever events 
>>>>>> had
>>>>>> come in, but not been written out to your HDFS sink, it might be possible
>>>>>> for me to reproduce this issue. Would it be possible to get some sample
>>>>>> data from you?
>>>>>>
>>>>>> Thanks,
>>>>>> Natty
>>>>>>
>>>>>>
>>>>>> On Tue, Jul 15, 2014 at 10:26 AM, SaravanaKumar TR <
>>>>>> saran0081...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Natty ,
>>>>>>>
>>>>>>> Just to understand , at present my settings is as
>>>>>>> "flume.root.logger=INFO,LOGFILE"
>>>>>>> in log4j.properties , do you want me to change it to
>>>>>>> "flume.root.logger=DEBUG,LOGFILE" and restart the agent.
>>>>>>>
>>>>>>> But when I start agent , I am already starting with below command.I
>>>>>>> guess i am using DEBUG already but not in config file , while starting
>>>>>>> agent.
>>>>>>>
>>>>>>> ../bin/flume-ng agent -c /d0/flume/conf -f
>>>>>>> /d0/flume/conf/flume-conf.properties -n a1 
>>>>>>> -Dflume.root.logger=DEBUG,LOGFILE
>>>>>>>
>>>>>>> If I do some changes in config "flume-conf.properties" or restart
>>>>>>> the agent , it works again and starts collecting the data.
>>>>>>>
>>>>>>> currently all my logs move to flume.log , I dont see any exception .
>>>>>>>
>>>>>>> cat flume.log | grep "Exception"  doesnt show any.
>>>>>>>
>>>>>>>
>>>>>>> On 15 July 2014 22:24, Jonathan Natkins <na...@streamsets.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Saravana,
>>>>>>>>
>>>>>>>> Our best bet on figuring out what's going on here may be to turn on
>>>>>>>> the debug logging. What I would recommend is stopping your agents, and
>>>>>>>> modifying the log4j properties to turn on DEBUG logging for the root
>>>>>>>> logger, and then restart the agents. Once the agent stops producing new
>>>>>>>> events, send out the logs and I'll be happy to take a look over them.
>>>>>>>>
>>>>>>>> Does the system begin working again if you restart the agents? Have
>>>>>>>> you noticed any other events correlated with the agent stopping 
>>>>>>>> collecting
>>>>>>>> events? Maybe a spike in events or something like that? And for my own
>>>>>>>> peace of mind, if you run `cat /var/log/flume-ng/* | grep "Exception"`,
>>>>>>>> does it bring anything back?
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>> Natty
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jul 15, 2014 at 2:55 AM, SaravanaKumar TR <
>>>>>>>> saran0081...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Natty,
>>>>>>>>>
>>>>>>>>> This is my entire config file.
>>>>>>>>>
>>>>>>>>> # Name the components on this agent
>>>>>>>>> a1.sources = r1
>>>>>>>>> a1.sinks = k1
>>>>>>>>> a1.channels = c1
>>>>>>>>>
>>>>>>>>> # Describe/configure the source
>>>>>>>>> a1.sources.r1.type = exec
>>>>>>>>> a1.sources.r1.command = tail -F /data/logs/test_log
>>>>>>>>> a1.sources.r1.restart = true
>>>>>>>>> a1.sources.r1.logStdErr = true
>>>>>>>>>
>>>>>>>>> #a1.sources.r1.batchSize = 2
>>>>>>>>>
>>>>>>>>> a1.sources.r1.interceptors = i1
>>>>>>>>> a1.sources.r1.interceptors.i1.type = regex_filter
>>>>>>>>> a1.sources.r1.interceptors.i1.regex = resuming normal
>>>>>>>>> operations|Received|Response
>>>>>>>>>
>>>>>>>>> #a1.sources.r1.interceptors = i2
>>>>>>>>> #a1.sources.r1.interceptors.i2.type = timestamp
>>>>>>>>> #a1.sources.r1.interceptors.i2.preserveExisting = true
>>>>>>>>>
>>>>>>>>> # Describe the sink
>>>>>>>>> a1.sinks.k1.type = hdfs
>>>>>>>>> a1.sinks.k1.hdfs.path = hdfs://
>>>>>>>>> testing.sck.com:9000/running/test.sck/date=%Y-%m-%d
>>>>>>>>> a1.sinks.k1.hdfs.writeFormat = Text
>>>>>>>>> a1.sinks.k1.hdfs.fileType = DataStream
>>>>>>>>> a1.sinks.k1.hdfs.filePrefix = events-
>>>>>>>>> a1.sinks.k1.hdfs.rollInterval = 600
>>>>>>>>> ##need to run hive query randomly to check teh long running
>>>>>>>>> process , so we  need to commit events in hdfs files regularly
>>>>>>>>> a1.sinks.k1.hdfs.rollCount = 0
>>>>>>>>> a1.sinks.k1.hdfs.batchSize = 10
>>>>>>>>> a1.sinks.k1.hdfs.rollSize = 0
>>>>>>>>> a1.sinks.k1.hdfs.useLocalTimeStamp = true
>>>>>>>>>
>>>>>>>>> # Use a channel which buffers events in memory
>>>>>>>>> a1.channels.c1.type = memory
>>>>>>>>> a1.channels.c1.capacity = 10000
>>>>>>>>> a1.channels.c1.transactionCapacity = 10000
>>>>>>>>>
>>>>>>>>> # Bind the source and sink to the channel
>>>>>>>>> a1.sources.r1.channels = c1
>>>>>>>>> a1.sinks.k1.channel = c1
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 14 July 2014 22:54, Jonathan Natkins <na...@streamsets.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Saravana,
>>>>>>>>>>
>>>>>>>>>> What does your sink configuration look like?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Natty
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Jul 11, 2014 at 11:05 PM, SaravanaKumar TR <
>>>>>>>>>> saran0081...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Assuming each line in the logfile is considered as a event for
>>>>>>>>>>> flume ,
>>>>>>>>>>>
>>>>>>>>>>> 1.Do we have any maximum size of event defined for memory/file
>>>>>>>>>>> channel.like any maximum no of characters in a line.
>>>>>>>>>>> 2.Does flume supports all formats of data to be processed as
>>>>>>>>>>> events or do we have any limitation.
>>>>>>>>>>>
>>>>>>>>>>> I am just still trying to understanding why the flume stops
>>>>>>>>>>> processing events after sometime.
>>>>>>>>>>>
>>>>>>>>>>> Can someone please help me out here.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> saravana
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 11 July 2014 17:49, SaravanaKumar TR <saran0081...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi ,
>>>>>>>>>>>>
>>>>>>>>>>>> I am new to flume and  using Apache Flume 1.5.0. Quick setup
>>>>>>>>>>>> explanation here.
>>>>>>>>>>>>
>>>>>>>>>>>> Source:exec , tail –F command for a logfile.
>>>>>>>>>>>>
>>>>>>>>>>>> Channel: tried with both Memory & file channel
>>>>>>>>>>>>
>>>>>>>>>>>> Sink: HDFS
>>>>>>>>>>>>
>>>>>>>>>>>> When flume starts , processing events happens properly and its
>>>>>>>>>>>> moved to hdfs without any issues.
>>>>>>>>>>>>
>>>>>>>>>>>> But after sometime flume suddenly stops sending events to HDFS.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I am not seeing any errors in logfile flume.log as well.Please
>>>>>>>>>>>> let me know if I am missing any configuration here.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Below is the channel configuration defined and I left the
>>>>>>>>>>>> remaining to be default values.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> a1.channels.c1.type = FILE
>>>>>>>>>>>>
>>>>>>>>>>>> a1.channels.c1.transactionCapacity = 100000
>>>>>>>>>>>>
>>>>>>>>>>>> a1.channels.c1.capacity = 10000000
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Saravana
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Flume stops processing event after a while

Reply via email to