The latency to start a Spark Job is nowhere close to 2-4 seconds under
typical conditions. You appear to be creating a new Spark Application
everytime instead of running multiple Jobs in one Application.
On Fri, Jul 6, 2018 at 3:12 AM Tien Dat wrote:
> Dear Timothy,
>
> It works like a charm now
I am running spark 2.1.0 on AWS EMR
In my Zeppelin Note I am creating a table
df.write
.format("parquet")
.saveAsTable("default.1test")
and I see the table when I
spark.catalog.listTables().show()
+++---+-+---+
|
I know there are some community efforts shown in Spark summits before,
mostly around reusing the same Spark context with multiple “jobs”.
I don’t think reducing Spark job startup time is a community priority afaik.
Tim
On Fri, Jul 6, 2018 at 7:12 PM Tien Dat wrote:
> Dear Timothy,
>
> It works
Hi Tien,
There is no retry on the job level as we expect the user to retry, and as
you mention we tolerate tasks retry already.
There is no request/limit type resource configuration that you described in
Mesos (yet).
So for 2) that’s not possible at the moment.
Tim
On Fri, Jul 6, 2018 at 11:4
Nirav,
withColumnRenamed() API might help but it does not different column and
renames all the occurrences of the given column. either use select() API
and rename as you want.
Thanks & Regards,
Gokula Krishnan* (Gokul)*
On Mon, Jul 2, 2018 at 5:52 PM, Nirav Patel wrote:
> Expr is `df1(a) ===
Dear all,
We are running Spark with Mesos as the resource manager. We are interesting
in some aspect, such as:
1, Is it possible to configure a specific job with a number of maximum
retries?
I meant here is the retry at job level, NOT the /spark.task.maxFailures/
which is for the task with a job.
> Hello,
>
>
>
> When I’m trying to set below options to spark-submit command on k8s Master
> getting below error in spark-driver pod logs
>
>
>
> --conf spark.executor.extraJavaOptions=" -Dhttps.proxyHost=myhost
> -Dhttps.proxyPort=8099 -Dhttp.useproxy=true -Dhttps.protocols=TLSv1.2" \
>
> --conf
Dear Timothy,
It works like a charm now.
BTW (don't judge me if I am to greedy :-)), the latency to start a Spark job
is around 2-4 seconds, unless I am not aware of some awesome optimization on
Spark. Do you know if Spark community is working on reducing this latency?
Best
--
Sent from: http
Got it, then you can have an extracted Spark directory on each host on the
same location, and don’t specify SPARK_EXECUTOR_URI. Instead, set
spark.mesos.executor.home to that directory.
This should effectively do what you want, which avoids extracting and
fetching and just executed the command.
T
Hi Tathagata,
Is there any limitation of below code while writing to multiple file ?
val inputdf:DataFrame =
sparkSession.readStream.schema(schema).format("csv").option("delimiter",",").csv("src/main/streamingInput")
query1 =
inputdf.writeStream.option("path","first_output").option("checkpoin
Thank you for your answer.
The think it I actually pointed to a local binary file. And Mesos locally
copied the binary file to a specific folder in /var/lib/mesos/... and
extract it to every time it launched an Spark executor. With the fetch
cache, the copy time is reduced, but the reduction is no
If its available locally on each host, then don’t specify a remote url but
a local file uri instead.
We have a fetcher cache in Mesos a while ago, I believe there is
integration in the Spark framework if you look at the documentation as
well. With the fetcher cache enabled Mesos agent will cache t
Dear all,
We are running Spark with Mesos as the master for resource management.
In our cluster, there are jobs that require very short response time (near
real time applications), which usually around 3-5 seconds.
In order to Spark to execute with Mesos, one has to specify the
SPARK_EXECUTOR_URI
unsubscribe
14 matches
Mail list logo