date:20160630

Re: Debugging Spark itself in standalone cluster mode

2016-06-30 Thread nirandap

Guys, Aren't TaskScheduler and DAGScheduler residing in the spark context? So, the debug configs need to be set in the JVM where the spark context is running? [1] But yes, I agree, if you really need to check the execution, you need to set those configs in the executors [2] [1] https://jaceklask

Re: Logical Plan

2016-06-30 Thread Mich Talebzadeh

I don't think Spark optimizer supports something like statement cache where plan is cached and bind variables (like RDBMS) are used for different values, thus saving the parsing. What you re stating is that the source and tempTable change but the plan itself remains the same. I have not seen this

Re: Logical Plan

2016-06-30 Thread Reynold Xin

drop user@spark and keep only dev@ This is something great to figure out, if you have time. Two things that would be great to try: 1. See how this works on Spark 2.0. 2. If it is slow, try the following: org.apache.spark.sql.catalyst.rules.RuleExecutor.resetTime() // run your query org.apache

Re: Logical Plan

2016-06-30 Thread Reynold Xin

Which version are you using here? If the underlying files change, technically we should go through optimization again. Perhaps the real "fix" is to figure out why is logical plan creation so slow for 700 columns. On Thu, Jun 30, 2016 at 1:58 PM, Darshan Singh wrote: > Is there a way I can use

Re: Debugging Spark itself in standalone cluster mode

2016-06-30 Thread Reynold Xin

Yes, scheduling is centralized in the driver. For debugging, I think you'd want to set the executor JVM, not the worker JVM flags. On Thu, Jun 30, 2016 at 11:36 AM, cbruegg wrote: > Hello everyone, > > I'm a student assistant in research at the University of Paderborn, working > on integrating

Debugging Spark itself in standalone cluster mode

2016-06-30 Thread cbruegg

Hello everyone, I'm a student assistant in research at the University of Paderborn, working on integrating Spark (v1.6.2) with a new network resource management system. I have already taken a deep dive into the source code of spark-core w.r.t. its scheduling systems. We are running a cluster in s

Re: branch-2.0 build failure

2016-06-30 Thread Pete Robbins

Ok, thanks. I'll await it appearing. On Thu, 30 Jun 2016 at 14:51 Sean Owen wrote: > TD has literally just merged the fix. > > On Thu, Jun 30, 2016 at 2:37 PM, Pete Robbins wrote: > > Our build on branch-2.0 is failing after the PR for updating kafka to > 0.10. > > The new kafka pom.xml files a

Re: branch-2.0 build failure

2016-06-30 Thread Sean Owen

TD has literally just merged the fix. On Thu, Jun 30, 2016 at 2:37 PM, Pete Robbins wrote: > Our build on branch-2.0 is failing after the PR for updating kafka to 0.10. > The new kafka pom.xml files are naming the parent version as 2.0.0-SNAPSHOT > but the branch 2.0 poms have been updated to 2.0

branch-2.0 build failure

2016-06-30 Thread Pete Robbins

Our build on branch-2.0 is failing after the PR for updating kafka to 0.10. The new kafka pom.xml files are naming the parent version as 2.0.0-SNAPSHOT but the branch 2.0 poms have been updated to 2.0.1-SNAPSHOT after the rc1 cut. Shouldn't the pom versions remain as 2.0.0-SNAPSHOT until a 2.0.0 ha

Re: Bitmap Indexing to increase OLAP query performance

2016-06-30 Thread Michael Allman

Hi Nishadi, I have not seen bloom filters in Spark. They are mentioned as part of the Orc file format, but I don't know if Spark uses them: https://orc.apache.org/docs/spec-index.html. Parquet has block-level min/max values, null counts, etc for leaf columns in its metadata. I don't believe Sp

Re: Spark 2.0 Performance drop

2016-06-30 Thread Maciej Bryński

I filled up 2 Jira. 1) Performance when queries nested column https://issues.apache.org/jira/browse/SPARK-16320 2) Pyspark performance https://issues.apache.org/jira/browse/SPARK-16321 I found Jira for: 1) PPD on nested columns https://issues.apache.org/jira/browse/SPARK-5151 2) Drop of support

Re: Debugging Spark itself in standalone cluster mode

Re: Logical Plan

Re: Logical Plan

Re: Logical Plan

Re: Debugging Spark itself in standalone cluster mode

Debugging Spark itself in standalone cluster mode

Re: branch-2.0 build failure

Re: branch-2.0 build failure

branch-2.0 build failure

Re: Bitmap Indexing to increase OLAP query performance

Re: Spark 2.0 Performance drop

11 matches

Site Navigation

Mail list logo

Footer information