Make sure you are setting num executors correctly
M
> On Jul 17, 2015, at 9:16 PM, Charles Menguy wrote:
>
> I am trying to use PySpark on EMR to analyze some data stored as
> SequenceFiles on S3, but running into performance issues due to data
> locality. Here is a very simple sample that
I am trying to use PySpark on EMR to analyze some data stored as
SequenceFiles on S3, but running into performance issues due to data
locality. Here is a very simple sample that doesn't work well:
seqRDD =
sc.sequenceFile("s3n://:@//day=2015-07-04/hour=*/*")
seqRDD.count()
The issue is with the c