Hi,
I was just curious if anyone has ever used Spark as an application server
cache?
My use case is:
* I have large datasets which need to be updated / inserted (upsert) in
the database
* I have actually found that it is much easier to run a Spark submit job
that pulls from the database, and co
Seemed like I was not able connect to sts.amazonaws.com. Fixed that error.
Now spark write to s3 is able to create folder structure on s3 but on final
file write it fails with below big error:
org.apache.spark.SparkException: Job aborted.
at
org.apache.spark.sql.execution.datasources.FileFormatWr
Hi Ranju,
Can you show the pods and their state? Does the situation happen at the
very beginning of spark-submit or some time later (once you've got a couple
of executors)? My understanding allows me to think of the driver not
starting up due to lack of resources or executors. In either case they'
Hi,
I submitted the spark job and pods goes in Pending state because of
insufficient resources.
But they are not getting deleted after this timeout of 60 sec. Please help me
in understanding.
Regards
Ranju
Hi,
I tried doing what Vladimir suggested. But no luck there either. My guess
is that it has something to do with securityContext.fsGroup. I am trying to
pass yaml file path along with spark submit command. My yaml file content
is
```
apiVersion: v1
kind: Pod
spec:
securityContext:
fsGr
Hi,
the fsGroup setting should match the id Spark is running at. When building
from source, that id is 185, and you can use "docker inspect "
to double-check.
On Wed, Feb 10, 2021 at 11:43 AM Rishabh Jain
wrote:
> Hi,
>
> I tried doing what Vladimir suggested. But no luck there either. My guess