AFAIK, interceptor is the best way without touching the code. How bad does the performance becomes? Must have collected some numbers, if possible can you share.
The other way is touch SpoolDirectory source code and save folder name in a variable. The Source opens each file for processing, during that time you can parse the set the folder name and keep on adding it to a specific header. The other way it to keep track of folder in interceptor, and update it when the file name changes. It means you still have to check each event. If you can come up with an efficient solution to detect diff in file name, perf hit would go down. HTH ! On Wed, Feb 11, 2015 at 9:47 AM, mahendran m <mahendra...@hotmail.com> wrote: > Hi , > > I am moving logs from local machine to HDFS server using flume with spooling > directory. Each log contain lacks of lines > > My use case is below > > Log file name foldername-filename-timestamp.suffix example file name is > LogFiles-Log1-1463238298.log > > my CONF is below > > a1.sinks = k1 > a1.channels = c1 > > #the source > > a1.sources.r1.type = spooldir > a1.sources.r1.spoolDir = F:\\SpoolingDirectory > a1.sources.r1.deletePolicy=immediate > a1.sources.r1.fileHeader = true > a1.sources.r1.interceptors = i1 > a1.sources.r1.interceptors.i1.type = > com.company.CustomInterceptor.CustomInterceptor$Builder > > #the sink > a1.sinks.k1.type = hdfs > a1.sinks.k1.hdfs.fileType = DataStream > a1.sinks.k1.hdfs.fileSuffix= .txt > a1.sinks.k1.hdfs.path = > hdfs://localhost:9000/spoolingdirectory/{foldername} > > #Channel > a1.channels.c1.type = memory > a1.channels.c1.capacity = 10000 > a1.channels.c1.transactionCapacity = 1000 > > #Flow > a1.sources.r1.channels = c1 > a1.sinks.k1.channel = c1 > > > in the custom interceptor we will process the file hear and extract the > folder name and add this as {foldername} header it is use in hdfspath. What > problem we are facing is for single file with lacks line this interceptor > extract the same folder name for lacks of time this will leads very high > performance degradation. > > Is there any way to handle my case without performing the same file header > for lacks time ? > > thanks. > > -- thanks ashish Blog: http://www.ashishpaliwal.com/blog My Photo Galleries: http://www.pbase.com/ashishpaliwal