Hello Pankaj, All changes for 1.3.1 release (over 1.3.0) are listed on the release notes page: http://flume.apache.org/releases/1.3.1.html
On Sun, Aug 18, 2013 at 11:20 AM, Pankaj Gupta <[email protected]> wrote: > Hi Hari, > > Just curios about the performance improvement, can you provide the number of > the JIRA that improves performance in 1.3.1? > > Thanks, > Pankaj > > > On Wed, Aug 14, 2013 at 2:23 PM, Hari Shreedharan > <[email protected]> wrote: >> >> Flume v1.3.0 had a major performance issue which is why 1.3.1 was released >> immediately after. The current stable release is 1.4.0 - so you should use >> that. >> >> 1. Can you detail this point? Channel to Sink should really not have any >> exceptions - if the sink or a plugin the sink is using is causing rollbacks, >> then that should handle the failure cases/drop events etc. The channel is >> pretty much a passive component just like a queue - "bad events" are events >> sinks cannot handle due to some reason. The logic of handling this should be >> in the sink itself. >> >> 2. Currently that is not an option, but if you need it, chances are there >> are others who do too. Explain your use-case in a jira. Remember, Flume is >> not a file streaming system, it is an event streaming one, so each file is >> still converted into events by Flume. >> >> 3. If you think the current deserializers don't fit your use-case, you can >> easily write your own and drop it in. >> >> >> Thanks, >> Hari >> >> On Wednesday, August 14, 2013 at 1:58 PM, Robert Heise wrote: >> >> Hello, >> >> As I continue to ramp up using Apache Flume (v1.3.0), I have observed a >> few challenges and hoping somebody who has more experience can shed some >> light. >> >> 1. Establishing a data pipeline is trivial, what I have noticed is that >> any exceptions caught from the channel->sink operation invoke what appears >> to be a repeating cycle of exceptions. As an example, any events which >> cause an exception (java stacktrace) put the agent into a tailspin. There >> are no tools for managing the pipeline to identify culprit events/files, >> stopping, purging the channel, introspecting the pipeline etc. The best >> course of action is to purge everything under file-channel and restart the >> agent. I've read several posts posturing that using regex interceptors >> could be a potential fix, but it is almost impossible to predict, in a >> production environment, what exceptions are going to occur. In my opinion, >> there has to be a declarative manner to move bad events out of the channel >> to a "dead-letter-queue" or equivalent. >> 2. I was hoping that the Spooling Directory Source would help us capture >> file metadata, but nothing ever appears in the default .flumespool >> trackerDir option? >> 3. Maybe my use case is not the right fit for Flume, but my largest design >> constraint is that we deal with files, everything we do is based on files. >> I was hoping that the spooldir and batch control options would provide an >> intuitive way to process files incoming to a spooldirectory, and ultimately >> land that same data to HDFS. However, a file with 470,000 lines is creating >> over 52MM events and because the tooling is week, I have no visibility into >> why that many events are being created, where the agent is in respect to >> completing. The data flow architecture is perfect, but maybe Flume is best >> used for logs, tailing of files, etc, not necessarily processing files? >> >> Thanks >> >> > > > > -- > > > P | (415) 677-9222 ext. 205 F | (415) 677-0895 | [email protected] > > Pankaj Gupta | Software Engineer > > BrightRoll, Inc. | Smart Video Advertising | www.brightroll.com > > > United States | Canada | United Kingdom | Germany > > > We're hiring! -- Harsh J
