Re: Fastest way to get data into flume?

2014-03-27 Thread Andrew Ehrlich
What about having more than one flume agent? You could have two agents that read the small messages and sink to HDFS, or two agents that read the messages, serialize them, and send them to a third agent which sinks them into HDFS. On Thu, Mar 27, 2014 at 9:43 AM, Chris Schneider < ch...@christop

ChannelException, bytecapacity

2014-05-14 Thread Andrew Ehrlich
I can't understand what this error is trying to tell me. Can anyone help? Caused by: org.apache.flume.ChannelException: Put queue for MemoryTransaction of byteCapacity 1832743000 bytes cannot add an event of size 598876 bytes because 299200 bytes are already used. Try consider comitting more freq

Re: flume hdfs sink notify / callback to add partition

2014-08-04 Thread Andrew Ehrlich
What about using a workflow tool like Oozie, Azkaban, or Amazon Data Pipeline? Set them to be triggered as soon as the s3 bucket is available and execute the ALTER TABLE command. On Thursday, July 31, 2014, Viral Bajaria wrote: > Any suggestions on this ? Still trying to figure out how do I get

Re: Collecting thousands of sources

2014-09-04 Thread Andrew Ehrlich
One way to avoid managing so many sources would be to have an aggregation point between the data generators the flume sources. For example, maybe you could have the data generators put events into a message queue(s), then have flume consume from there? Andrew On Thu, 04 Sep 2014 08:29:04

Re: Newbie - Sink question

2014-09-04 Thread Andrew Ehrlich
What about adding in the data from MySQL as a small batch job after flume sinks to S3? You could then delete the raw data that flume sank. I would worry that the database connection would be relatively slow and unreliable and may slow the Flume throughput. Andrew On Sep 4, 2014, at 7:53 PM, K