Hi,
In the latest release of SPARK I have seen significant improvements in case
your data is in parquet format, which I see it is.
But since you are not using spark session and using older API's of spark
with spark sqlContext therefore there is a high chance that you are not
using the spark impro
And on another note, is there any particular reason for you using s3a://
instead of s3://?
Regards,
Gourav
On Wed, Mar 22, 2017 at 8:30 PM, Matt Deaver wrote:
> For various reasons, our data set is partitioned in Spark by customer id
> and saved to S3. When trying to read this data, however,
could you give the event timeline and dag for the time consuming stages on
spark UI?
On Thu, Mar 23, 2017 at 4:30 AM, Matt Deaver wrote:
> For various reasons, our data set is partitioned in Spark by customer id
> and saved to S3. When trying to read this data, however, the larger
> partitions m