Re: Spark Streaming S3 Performance Implications

2015-04-01 Thread Mike Trienis
Hey Chris, Apologies for the delayed reply. Your responses are always insightful and appreciated :-) However, I have a few more questions. "also, it looks like you're writing to S3 per RDD. you'll want to broaden that out to write DStream batches" I assume you mean "dstream.saveAsTextFiles(...

Re: Spark Streaming S3 Performance Implications

2015-03-21 Thread Ted Yu
Mike: Once hadoop 2.7.0 is released, you should be able to enjoy the enhanced performance of s3a. See HADOOP-11571 Cheers On Sat, Mar 21, 2015 at 8:09 AM, Chris Fregly wrote: > hey mike! > > you'll definitely want to increase your parallelism by adding more shards > to the stream - as well as s

Re: Spark Streaming S3 Performance Implications

2015-03-21 Thread Chris Fregly
hey mike! you'll definitely want to increase your parallelism by adding more shards to the stream - as well as spinning up 1 receiver per shard and unioning all the shards per the KinesisWordCount example that is included with the kinesis streaming package.  you'll need more cores (cluster) or t