Re: Downloads from S3 exceedingly slow when running on spark-ec2

2014-12-23 Thread Jon Chase
Turns out I was using the s3:// prefix (in a standalone Spark cluster). It was writing a LOT of block_* files to my S3 bucket, which was the cause for the slowness. I was coming from Amazon EMR, where Amazon's underlying FS implementation has re-mapped s3:// to s3n://, which doesn't use the block

Re: Downloads from S3 exceedingly slow when running on spark-ec2

2014-12-20 Thread Paul Brown
I would suggest checking out disk IO on the nodes in your cluster and then reading up on the limiting behaviors that accompany different kinds of EC2 storage. Depending on how things are configured for your nodes, you may have a local storage configuration that provides "bursty" IOPS where you get

Re: Downloads from S3 exceedingly slow when running on spark-ec2

2014-12-20 Thread Nicholas Chammas
Is the operation slow every time or does it run normally if you repeat the operation within the same app? Nick On Thu, Dec 18, 2014 at 8:56 AM, Jon Chase wrote: > I'm running a very simple Spark application that downloads files from S3, > does a bit of mapping, then uploads new files. Each fi