Mike, Where is the SpoolingFileSource that you are referring to ? -roshan
On Tue, Jan 22, 2013 at 6:39 PM, Mike Percy <[email protected]> wrote: > Hi Roshan, > Yep in general I'd have concerns w.r.t. capacity planning and garbage > collector behavior for large events. Flume holds at least one event batch > in memory at once, depending on # of sources/sinks, and even with a batch > size of 1 if you have unpredictably large events there is nothing > preventing an OutOfMemoryError in extreme cases. But if you plan for > capacity and test thoroughly then it can be made to work. > > Regards, > Mike > > > On Tue, Jan 22, 2013 at 3:38 PM, Roshan Naik <[email protected]>wrote: > >> i recall some discussion with regards to being cautious on the size of >> the events (in this case the file being moved) as flume is not quite >> intended for large events. Mike perhaps you can throw some light on that >> aspect ? >> >> >> On Tue, Jan 22, 2013 at 12:17 AM, Mike Percy <[email protected]> wrote: >> >>> Check out the latest changes to SpoolingFileSource w.r.t. >>> EventDeserializers on trunk. You can deserialize a whole file that way if >>> you want. Whether that is a good idea depends on your use case, though. >>> >>> It's on trunk, lacking user docs for the latest changes but I will try >>> to hammer out updated docs soon. In the meantime, you can just look at the >>> code and read the comments. >>> >>> Regards, >>> Mike >>> >>> On Monday, January 21, 2013, Nitin Pawar wrote: >>> >>>> you cant configure it to send the entire file in an event unless you >>>> have fixed number of events in each of the files. basically it reads the >>>> entire file into a channel and then starts writing. >>>> >>>> so as long as you can limit the events in the file, i think you can >>>> send entire file as a transaction but not as a single event >>>> as long as I understand flume treats individual lines in the file as >>>> event >>>> >>>> if you want to pull the entire file then you may want to implement that >>>> with messaging queues where you send an event to activemq queue and then >>>> your consumer may pull the file in one transaction with some other >>>> mechanism like ftp or scp or something like that >>>> >>>> others will have better idea, i am just suggesting a crude way to get >>>> the entire file as a single event >>>> >>>> >>>> On Tue, Jan 22, 2013 at 12:19 PM, Henry Ma <[email protected]>wrote: >>>> >>>>> As far as I know, Directory Spooling Source will send the file line by >>>>> line as an event, and File Roll Sink will receive these lines and roll up >>>>> to a big file by a fixed interval. Is it right, and can we config it to >>>>> send the whole file as an event? >>>>> >>>>> >>>>> On Tue, Jan 22, 2013 at 1:22 PM, Nitin Pawar >>>>> <[email protected]>wrote: >>>>> >>>>>> why don't you use directory spooling ? >>>>>> >>>>>> >>>>>> On Tue, Jan 22, 2013 at 7:15 AM, Henry Ma <[email protected]>wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> When using Flume to collect log files, we want to just COPY the >>>>>>> original files from several servers to a central storage (unix file >>>>>>> system), not to roll up to a big file. Because we must record some >>>>>>> messages >>>>>>> of the original file such as name, host, path, timestamp, etc. Besides, >>>>>>> we >>>>>>> want to guarantee total reliability: no file miss, no file reduplicated. >>>>>>> >>>>>>> It seems that, in Source, we must put a whole file (size may be >>>>>>> between 100KB and 100MB) into a Flume event; and in Sink, we must write >>>>>>> each event to a single file. >>>>>>> >>>>>>> Is it practicable? Thanks! >>>>>>> >>>>>>> -- >>>>>>> Best Regards, >>>>>>> Henry Ma >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Nitin Pawar >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Best Regards, >>>>> Henry Ma >>>>> >>>> >>>> >>>> >>>> -- >>>> Nitin Pawar >>>> >>> >> >
