ecret" \
--conf "spark.kubernetes.namespace=spark2" \
--conf "spark.executor.instances=4" \
--class SparkPi "local:///opt/jar/sparkpi_2.10-1.0.jar" 10
of course /opt/jar/sparkpi_2.10-1.0.jar is part of my docker build.
Thank you in advance.
Antoine DUBOIS
CCIN2P3
smime.p7s
Description: S/MIME Cryptographic Signature
number of core and processing uncompressed data is
indeed faster.
My bottleneck seems to be the compression.
Thank you all and have a merry chrismas
De: "ayan guha"
À: "Enrico Minack"
Cc: "Antoine DUBOIS" , "Chris Teoh"
, user@spark.apache.org
En
Also,
the framework allow to execute all the modification at the same time as one big
request (but i wont paste it here, it would not be really relevant
De: "Antoine DUBOIS"
À: "Enrico Minack"
Cc: "Chris Teoh" , "user @spark"
Envoyé: Mercr
didn't had time to let it finish.
De: "Enrico Minack"
À: "Chris Teoh" , "Antoine DUBOIS"
Cc: "user @spark"
Envoyé: Mercredi 18 Décembre 2019 14:29:07
Objet: Re: Identify bottleneck
Good points, but single-line CSV files are splitable (n
nack"
À: user@spark.apache.org, "Antoine DUBOIS"
Envoyé: Mercredi 18 Décembre 2019 11:13:38
Objet: Re: Identify bottleneck
How many withColumn statements do you have? Note that it is better to use a
single select, rather than lots of withColumn. This also makes drops redundant.
Readin
Hello
I'm working on an ETL based on csv describing file systems to transform it into
parquet so I can work on them easily to extract informations.
I'm using Mr. Powers framework Daria to do so. I've quiet different input and a
lot of transformation and the framework helps organize the code.
Hello,
I'm using hadoop 3.1.2 with Yarn and Spark 2.4.2:
I'm trying to read file compressed with zstd command line from spark shell.
However after a huge fight to finally understand issue in library import and
other stuff, I no longer face error when trying to read those files.
However If I tr
Hello,
I've a question regarding a use case.
I have an ETL using spark and working great.
I use cephFS mounted on all spark node to store data.
However one problem I have is that b2zipping + transfer from source to spark
storage is really long.
I would like to be able to process the file as
Hello,
I'm trying to mount a local ceph volume to my mesos container.
My cephfs is mounted on all agent at /ceph
I'm using spark 2.4 with hadoop 3.11 and I'm not using Docker to deploy spark.
The only option I could find to mount a volume though is the following (which
is also a line I added t