Guys,
Aren't TaskScheduler and DAGScheduler residing in the spark context? So,
the debug configs need to be set in the JVM where the spark context is
running? [1]
But yes, I agree, if you really need to check the execution, you need to
set those configs in the executors [2]
[1]
https://jaceklask
I don't think Spark optimizer supports something like statement cache where
plan is cached and bind variables (like RDBMS) are used for different
values, thus saving the parsing.
What you re stating is that the source and tempTable change but the plan
itself remains the same. I have not seen this
drop user@spark and keep only dev@
This is something great to figure out, if you have time. Two things that
would be great to try:
1. See how this works on Spark 2.0.
2. If it is slow, try the following:
org.apache.spark.sql.catalyst.rules.RuleExecutor.resetTime()
// run your query
org.apache
Which version are you using here? If the underlying files change,
technically we should go through optimization again.
Perhaps the real "fix" is to figure out why is logical plan creation so
slow for 700 columns.
On Thu, Jun 30, 2016 at 1:58 PM, Darshan Singh
wrote:
> Is there a way I can use
Yes, scheduling is centralized in the driver.
For debugging, I think you'd want to set the executor JVM, not the worker
JVM flags.
On Thu, Jun 30, 2016 at 11:36 AM, cbruegg wrote:
> Hello everyone,
>
> I'm a student assistant in research at the University of Paderborn, working
> on integrating
Hello everyone,
I'm a student assistant in research at the University of Paderborn, working
on integrating Spark (v1.6.2) with a new network resource management system.
I have already taken a deep dive into the source code of spark-core w.r.t.
its scheduling systems.
We are running a cluster in s
Ok, thanks. I'll await it appearing.
On Thu, 30 Jun 2016 at 14:51 Sean Owen wrote:
> TD has literally just merged the fix.
>
> On Thu, Jun 30, 2016 at 2:37 PM, Pete Robbins wrote:
> > Our build on branch-2.0 is failing after the PR for updating kafka to
> 0.10.
> > The new kafka pom.xml files a
TD has literally just merged the fix.
On Thu, Jun 30, 2016 at 2:37 PM, Pete Robbins wrote:
> Our build on branch-2.0 is failing after the PR for updating kafka to 0.10.
> The new kafka pom.xml files are naming the parent version as 2.0.0-SNAPSHOT
> but the branch 2.0 poms have been updated to 2.0
Our build on branch-2.0 is failing after the PR for updating kafka to 0.10.
The new kafka pom.xml files are naming the parent version as 2.0.0-SNAPSHOT
but the branch 2.0 poms have been updated to 2.0.1-SNAPSHOT after the rc1
cut. Shouldn't the pom versions remain as 2.0.0-SNAPSHOT until a 2.0.0 ha
Hi Nishadi,
I have not seen bloom filters in Spark. They are mentioned as part of the Orc
file format, but I don't know if Spark uses them:
https://orc.apache.org/docs/spec-index.html. Parquet has block-level min/max
values, null counts, etc for leaf columns in its metadata. I don't believe
Sp
I filled up 2 Jira.
1) Performance when queries nested column
https://issues.apache.org/jira/browse/SPARK-16320
2) Pyspark performance
https://issues.apache.org/jira/browse/SPARK-16321
I found Jira for:
1) PPD on nested columns
https://issues.apache.org/jira/browse/SPARK-5151
2) Drop of support
11 matches
Mail list logo