Ashish/nattty, The above solution works fine till now.I am planning to move from exec to spooling directory because of reliability.
I am planning to use spooling directory to move logfiles in hdfs sink. I like to know how flume identifies the file we are moving to spool directory is complete file or partial & its move still in progress. if suppose a file is of large size and we started moving it to spooler directory , how flume identifies that the complete file is transferred or is still in progress. I mean when the flume starts processing that file. Please help me out here. Thanks, saravana On Thu, Jul 17, 2014 at 8:21 PM, SaravanaKumar TR <saran0081...@gmail.com> wrote: > Thanks Natty & Ashish. > > I have restarted flume agent with below config.Will monitor it for couple > of days whether it stops randomly. > > JAVA_OPTS="-Xms1g -Xmx1g -Dcom.sun.management.jmxremote > -XX:-HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./java_pid<pid>.hprof" > > thanks for all again.Hope this will work well. > > > On 17 July 2014 12:24, SaravanaKumar TR <saran0081...@gmail.com> wrote: > >> thanks its really helpful. >> I guess the default heap dump path is /tmp ? >> >> >> On 17 July 2014 12:11, Ashish <paliwalash...@gmail.com> wrote: >> >>> Nope, a heap dump shall be generated. Please see more options at >>> http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html >>> >>> to specify path use this -XX:HeapDumpPath=./java_pid<pid>.hprof >>> >>> >>> On Thu, Jul 17, 2014 at 12:09 PM, SaravanaKumar TR < >>> saran0081...@gmail.com> wrote: >>> >>>> yes , sorry I missed to update as 1 GB. >>>> >>>> But for out of memory error ,do we get notified in flume logs? I >>>> haven't see any exception till now. >>>> >>>> >>>> On 17 July 2014 11:55, SaravanaKumar TR <saran0081...@gmail.com> wrote: >>>> >>>>> Thanks Ashish , So I wil go ahead and update the flume-env,sh file with >>>>> >>>>> JAVA_OPTS="-Xms100m -Xmx200m -Dcom.sun.management.jmxremote >>>>> -XX:-HeapDumpOnOutOfMemoryError" >>>>> >>>>> >>>>> On 17 July 2014 11:39, Ashish <paliwalash...@gmail.com> wrote: >>>>> >>>>>> Add -XX:-HeapDumpOnOutOfMemoryError parameter as well, if your >>>>>> process is OOME, would generate a Heap dump. Allocate Heap based on the >>>>>> number of events you need to keep in channel. Try with 1 GB, but >>>>>> calculate >>>>>> according the Channel size as (average event size * number of events), >>>>>> plus >>>>>> object over heads. >>>>>> >>>>>> Please note, this is just a rough calculation, actual memory usage >>>>>> would be higher. >>>>>> >>>>>> >>>>>> On Thu, Jul 17, 2014 at 11:21 AM, SaravanaKumar TR < >>>>>> saran0081...@gmail.com> wrote: >>>>>> >>>>>>> Okay thanks , So for 128 GB , I will allocate 1 GB as a heap memory >>>>>>> for flume agent. >>>>>>> >>>>>>> But I am surprised why there was no error registered for this memory >>>>>>> issues in log file (flume.log). >>>>>>> >>>>>>> Do i need to check in any other logs? >>>>>>> >>>>>>> >>>>>>> On 16 July 2014 21:55, Jonathan Natkins <na...@streamsets.com> >>>>>>> wrote: >>>>>>> >>>>>>>> That's definitely your problem. 20MB is way too low for this. >>>>>>>> Depending on the other processes you're running with your system, the >>>>>>>> amount of memory you'll need will vary, but I'd recommend at least >>>>>>>> 1GB. You >>>>>>>> should define it exactly where it's defined right now, so instead of >>>>>>>> the >>>>>>>> current command, you can run: >>>>>>>> >>>>>>>> "/cv/jvendor/bin/java -Xmx1g >>>>>>>> -Dflume.root.logger=DEBUG,LOGFILE......" >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Jul 16, 2014 at 3:03 AM, SaravanaKumar TR < >>>>>>>> saran0081...@gmail.com> wrote: >>>>>>>> >>>>>>>>> I guess i am using defaulk values , from running flume i could see >>>>>>>>> these lines "/cv/jvendor/bin/java -Xmx20m >>>>>>>>> -Dflume.root.logger=DEBUG,LOGFILE......" >>>>>>>>> >>>>>>>>> so i guess it takes 20 mb as agent flume memory. >>>>>>>>> My RAM is 128 GB.So please suggest how much can i assign as heap >>>>>>>>> memory and where to define it. >>>>>>>>> >>>>>>>>> >>>>>>>>> On 16 July 2014 15:05, Jonathan Natkins <na...@streamsets.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hey Saravana, >>>>>>>>>> >>>>>>>>>> I'm attempting to reproduce this, but do you happen to know what >>>>>>>>>> the Java heap size is for your Flume agent? This information leads >>>>>>>>>> me to >>>>>>>>>> believe that you don't have enough memory allocated to the agent, >>>>>>>>>> which you >>>>>>>>>> may need to do with the -Xmx parameter when you start up your agent. >>>>>>>>>> That >>>>>>>>>> aside, you can set the byteCapacity parameter on the memory channel >>>>>>>>>> to >>>>>>>>>> specify how much memory it is allowed to use. It should default to >>>>>>>>>> 80% of >>>>>>>>>> the Java heap size, but if your heap is too small, this might be a >>>>>>>>>> cause of >>>>>>>>>> errors. >>>>>>>>>> >>>>>>>>>> Does anything get written to the log when you try to pass in an >>>>>>>>>> event of this size? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Natty >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Jul 16, 2014 at 1:46 AM, SaravanaKumar TR < >>>>>>>>>> saran0081...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Natty, >>>>>>>>>>> >>>>>>>>>>> While looking further , i could see memory channal stops if a >>>>>>>>>>> line comes with greater than 2 MB.Let me know which parameter helps >>>>>>>>>>> us to >>>>>>>>>>> define max event size of about 3 MB. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 16 July 2014 12:46, SaravanaKumar TR <saran0081...@gmail.com> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> I am asking point 1 , because in some cases I could see a line >>>>>>>>>>>> in logfile around 2 MB.So i need to know what mamimum event >>>>>>>>>>>> size.How to >>>>>>>>>>>> measure it? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 16 July 2014 10:18, SaravanaKumar TR <saran0081...@gmail.com >>>>>>>>>>>> > wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Natty, >>>>>>>>>>>>> >>>>>>>>>>>>> Please help me to get the answers for the below queries. >>>>>>>>>>>>> >>>>>>>>>>>>> 1,In case of exec source , (tail -F <logfile>) , is that each >>>>>>>>>>>>> line in file is considered to be a single event ? >>>>>>>>>>>>> If suppose a line is considered to be a event , what is that >>>>>>>>>>>>> maximum size of event supported by flume?I mean maximum >>>>>>>>>>>>> characters in a >>>>>>>>>>>>> line supported? >>>>>>>>>>>>> 2.When event stop processing , I am not seeing "tail -F" >>>>>>>>>>>>> command running in the background. >>>>>>>>>>>>> I have used option like "a1.sources.r1.restart = true >>>>>>>>>>>>> a1.sources.r1.logStdErr = true".. >>>>>>>>>>>>> Does these config will not send any errors to flume.log if any >>>>>>>>>>>>> issues in tail? >>>>>>>>>>>>> Will this config doesnt try to restart the "tail -F" if its >>>>>>>>>>>>> not running in the background. >>>>>>>>>>>>> >>>>>>>>>>>>> 3.Does flume supports all formats of data in logfile or it has >>>>>>>>>>>>> any predefined data formats.. >>>>>>>>>>>>> >>>>>>>>>>>>> Please help me with these to understand better.. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 16 July 2014 00:56, Jonathan Natkins <na...@streamsets.com> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Saravana, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Everything here looks pretty sane. Do you have a record of >>>>>>>>>>>>>> the events that came in leading up to the agent stopping >>>>>>>>>>>>>> collection? If you >>>>>>>>>>>>>> can provide the last file created by the agent, and ideally >>>>>>>>>>>>>> whatever events >>>>>>>>>>>>>> had come in, but not been written out to your HDFS sink, it >>>>>>>>>>>>>> might be >>>>>>>>>>>>>> possible for me to reproduce this issue. Would it be possible to >>>>>>>>>>>>>> get some >>>>>>>>>>>>>> sample data from you? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Natty >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, Jul 15, 2014 at 10:26 AM, SaravanaKumar TR < >>>>>>>>>>>>>> saran0081...@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Natty , >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Just to understand , at present my settings is as >>>>>>>>>>>>>>> "flume.root.logger=INFO,LOGFILE" >>>>>>>>>>>>>>> in log4j.properties , do you want me to change it to >>>>>>>>>>>>>>> "flume.root.logger=DEBUG,LOGFILE" and restart the agent. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> But when I start agent , I am already starting with below >>>>>>>>>>>>>>> command.I guess i am using DEBUG already but not in config file >>>>>>>>>>>>>>> , while >>>>>>>>>>>>>>> starting agent. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ../bin/flume-ng agent -c /d0/flume/conf -f >>>>>>>>>>>>>>> /d0/flume/conf/flume-conf.properties -n a1 >>>>>>>>>>>>>>> -Dflume.root.logger=DEBUG,LOGFILE >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> If I do some changes in config "flume-conf.properties" or >>>>>>>>>>>>>>> restart the agent , it works again and starts collecting the >>>>>>>>>>>>>>> data. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> currently all my logs move to flume.log , I dont see any >>>>>>>>>>>>>>> exception . >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> cat flume.log | grep "Exception" doesnt show any. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 15 July 2014 22:24, Jonathan Natkins < >>>>>>>>>>>>>>> na...@streamsets.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Saravana, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Our best bet on figuring out what's going on here may be to >>>>>>>>>>>>>>>> turn on the debug logging. What I would recommend is stopping >>>>>>>>>>>>>>>> your agents, >>>>>>>>>>>>>>>> and modifying the log4j properties to turn on DEBUG logging >>>>>>>>>>>>>>>> for the root >>>>>>>>>>>>>>>> logger, and then restart the agents. Once the agent stops >>>>>>>>>>>>>>>> producing new >>>>>>>>>>>>>>>> events, send out the logs and I'll be happy to take a look >>>>>>>>>>>>>>>> over them. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Does the system begin working again if you restart the >>>>>>>>>>>>>>>> agents? Have you noticed any other events correlated with the >>>>>>>>>>>>>>>> agent >>>>>>>>>>>>>>>> stopping collecting events? Maybe a spike in events or >>>>>>>>>>>>>>>> something like that? >>>>>>>>>>>>>>>> And for my own peace of mind, if you run `cat >>>>>>>>>>>>>>>> /var/log/flume-ng/* | grep >>>>>>>>>>>>>>>> "Exception"`, does it bring anything back? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks! >>>>>>>>>>>>>>>> Natty >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Tue, Jul 15, 2014 at 2:55 AM, SaravanaKumar TR < >>>>>>>>>>>>>>>> saran0081...@gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi Natty, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> This is my entire config file. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> # Name the components on this agent >>>>>>>>>>>>>>>>> a1.sources = r1 >>>>>>>>>>>>>>>>> a1.sinks = k1 >>>>>>>>>>>>>>>>> a1.channels = c1 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> # Describe/configure the source >>>>>>>>>>>>>>>>> a1.sources.r1.type = exec >>>>>>>>>>>>>>>>> a1.sources.r1.command = tail -F /data/logs/test_log >>>>>>>>>>>>>>>>> a1.sources.r1.restart = true >>>>>>>>>>>>>>>>> a1.sources.r1.logStdErr = true >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> #a1.sources.r1.batchSize = 2 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> a1.sources.r1.interceptors = i1 >>>>>>>>>>>>>>>>> a1.sources.r1.interceptors.i1.type = regex_filter >>>>>>>>>>>>>>>>> a1.sources.r1.interceptors.i1.regex = resuming normal >>>>>>>>>>>>>>>>> operations|Received|Response >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> #a1.sources.r1.interceptors = i2 >>>>>>>>>>>>>>>>> #a1.sources.r1.interceptors.i2.type = timestamp >>>>>>>>>>>>>>>>> #a1.sources.r1.interceptors.i2.preserveExisting = true >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> # Describe the sink >>>>>>>>>>>>>>>>> a1.sinks.k1.type = hdfs >>>>>>>>>>>>>>>>> a1.sinks.k1.hdfs.path = hdfs:// >>>>>>>>>>>>>>>>> testing.sck.com:9000/running/test.sck/date=%Y-%m-%d >>>>>>>>>>>>>>>>> a1.sinks.k1.hdfs.writeFormat = Text >>>>>>>>>>>>>>>>> a1.sinks.k1.hdfs.fileType = DataStream >>>>>>>>>>>>>>>>> a1.sinks.k1.hdfs.filePrefix = events- >>>>>>>>>>>>>>>>> a1.sinks.k1.hdfs.rollInterval = 600 >>>>>>>>>>>>>>>>> ##need to run hive query randomly to check teh long >>>>>>>>>>>>>>>>> running process , so we need to commit events in hdfs files >>>>>>>>>>>>>>>>> regularly >>>>>>>>>>>>>>>>> a1.sinks.k1.hdfs.rollCount = 0 >>>>>>>>>>>>>>>>> a1.sinks.k1.hdfs.batchSize = 10 >>>>>>>>>>>>>>>>> a1.sinks.k1.hdfs.rollSize = 0 >>>>>>>>>>>>>>>>> a1.sinks.k1.hdfs.useLocalTimeStamp = true >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> # Use a channel which buffers events in memory >>>>>>>>>>>>>>>>> a1.channels.c1.type = memory >>>>>>>>>>>>>>>>> a1.channels.c1.capacity = 10000 >>>>>>>>>>>>>>>>> a1.channels.c1.transactionCapacity = 10000 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> # Bind the source and sink to the channel >>>>>>>>>>>>>>>>> a1.sources.r1.channels = c1 >>>>>>>>>>>>>>>>> a1.sinks.k1.channel = c1 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On 14 July 2014 22:54, Jonathan Natkins < >>>>>>>>>>>>>>>>> na...@streamsets.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi Saravana, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> What does your sink configuration look like? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>> Natty >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Fri, Jul 11, 2014 at 11:05 PM, SaravanaKumar TR < >>>>>>>>>>>>>>>>>> saran0081...@gmail.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Assuming each line in the logfile is considered as a >>>>>>>>>>>>>>>>>>> event for flume , >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 1.Do we have any maximum size of event defined for >>>>>>>>>>>>>>>>>>> memory/file channel.like any maximum no of characters in a >>>>>>>>>>>>>>>>>>> line. >>>>>>>>>>>>>>>>>>> 2.Does flume supports all formats of data to be >>>>>>>>>>>>>>>>>>> processed as events or do we have any limitation. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I am just still trying to understanding why the flume >>>>>>>>>>>>>>>>>>> stops processing events after sometime. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Can someone please help me out here. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>> saravana >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On 11 July 2014 17:49, SaravanaKumar TR < >>>>>>>>>>>>>>>>>>> saran0081...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Hi , >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I am new to flume and using Apache Flume 1.5.0. Quick >>>>>>>>>>>>>>>>>>>> setup explanation here. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Source:exec , tail –F command for a logfile. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Channel: tried with both Memory & file channel >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Sink: HDFS >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> When flume starts , processing events happens properly >>>>>>>>>>>>>>>>>>>> and its moved to hdfs without any issues. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> But after sometime flume suddenly stops sending events >>>>>>>>>>>>>>>>>>>> to HDFS. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I am not seeing any errors in logfile flume.log as >>>>>>>>>>>>>>>>>>>> well.Please let me know if I am missing any configuration >>>>>>>>>>>>>>>>>>>> here. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Below is the channel configuration defined and I left >>>>>>>>>>>>>>>>>>>> the remaining to be default values. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> a1.channels.c1.type = FILE >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> a1.channels.c1.transactionCapacity = 100000 >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> a1.channels.c1.capacity = 10000000 >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>> Saravana >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> thanks >>>>>> ashish >>>>>> >>>>>> Blog: http://www.ashishpaliwal.com/blog >>>>>> My Photo Galleries: http://www.pbase.com/ashishpaliwal >>>>>> >>>>> >>>>> >>>> >>> >>> >>> -- >>> thanks >>> ashish >>> >>> Blog: http://www.ashishpaliwal.com/blog >>> My Photo Galleries: http://www.pbase.com/ashishpaliwal >>> >> >> >