Hi Denny, have you considered saving those files to HDFS and sending the "event" information to Kafka?
You could then pass that off to Apache Spark in a consumer and get data locality for the file saved (or something of the sort [no pun intended]). You could also stream every line (or however you want to "chunk" it) in the file as a separate message to the broker with a wrapping message object (so you know which file you are dealing with when consuming). What you plan to-do with the data has a lot to-do with how you are going to process and manage it. /******************************************* Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop> ********************************************/ On Tue, Jun 24, 2014 at 11:35 AM, Denny Lee <denny.g....@gmail.com> wrote: > By any chance has anyone worked with using Kafka with message sizes that > are approximately 50MB in size? Based on from some of the previous threads > there are probably some concerns on memory pressure due to the compression > on the broker and decompression on the consumer and a best practices on > ensuring batch size (to ultimately not have the compressed message exceed > message size limit). > > Any other best practices or thoughts concerning this scenario? > > Thanks! > Denny > >