Hello everyone,
I'm using 'r4.8xlarge' instances on EMR for my Spark Application.
To each node, I'm attaching one 512 GB EBS volume.
By logging in into nodes I tried verifying that this volume is being set
for 'spark.local.dir' by EMR automatically, but couldn't find any such
configuration.
Can
Hi Jonathan,
Does that mean Hadoop-AWS 2.7.3 too is built against AWS SDK 1.11.160 and
not 1.7.4?
Thanks.
On Oct 7, 2017 3:50 PM, "Jean Georges Perrin" wrote:
Hey Marco,
I am actually reading from S3 and I use 2.7.3, but I inherited the project
and they use some AWS API from Amazon SDK, whi
Hello Sparkans,
I want to merge following cluster / set of IDs into one if they have shared
IDs.
For example:
uuid_3_1,uuid_3_2,uuid_3_3,uuid_3_4
uuid_3_2,uuid_3_5,uuid_3_6
uuid_3_5,uuid_3_7,uuid_3_8,uuid_3_9
into single:
uuid_3_1,uuid_3_2,uuid_3_3,uuid_3_4,uuid_3_5,uuid_3_6,uuid_3_7,uuid_3_8,
Hello,
I'm writing a Spark based application which works around a pretty huge data
stored on s3. It's about **15 TB** in size uncompressed. Data is laid
across multiple small LZO compressed files files, varying from 10-100MB.
By default the job spawns 130k tasks while reading dataset and mapping