date:20161012

RFC / PRD: new executor & node blacklist mechanism (SPARK-8425)

2016-10-12 Thread Imran Rashid

Some new features are about to land in spark to improve Spark's ability to handle bad executors and nodes. These are some significant changes, and we'd like to gather more input from the community about it, especially folks that use *large clusters*. We've spent a lot of time discussing the right

`Project` not preserving child partitioning ?

2016-10-12 Thread Tejas Patil

See https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala#L80 Project operator preserves child's sort ordering but for output partitioning, it does not. I don't see any way projection would alter the partitioning of the chil

Re: StructuredStreaming Custom Sinks (motivated by Structured Streaming Machine Learning)

2016-10-12 Thread Fred Reiss

On Tue, Oct 11, 2016 at 10:57 AM, Reynold Xin wrote: > > On Tue, Oct 11, 2016 at 10:55 AM, Michael Armbrust > wrote: > >> *Complex event processing and state management:* Several groups I've >>> talked to want to run a large number (tens or hundreds of thousands now, >>> millions in the near fut

Re: `Project` not preserving child partitioning ?

2016-10-12 Thread Reynold Xin

It actually does -- but do it through a really weird way. UnaryNodeExec actually defines: trait UnaryExecNode extends SparkPlan { def child: SparkPlan override final def children: Seq[SparkPlan] = child :: Nil override def outputPartitioning: Partitioning = child.outputPartitioning } I

Re: `Project` not preserving child partitioning ?

2016-10-12 Thread Tejas Patil

Sure :) Thanks, Tejas On Wed, Oct 12, 2016 at 11:26 AM, Reynold Xin wrote: > It actually does -- but do it through a really weird way. > > UnaryNodeExec actually defines: > > trait UnaryExecNode extends SparkPlan { > def child: SparkPlan > > override final def children: Seq[SparkPlan] = chi

incorrect message that path appears to be local

2016-10-12 Thread Koert Kuipers

i see this warning when running jobs on cluster: 2016-10-12 14:46:47 WARN spark.SparkContext: Spark is not running in local mode, therefore the checkpoint directory must not be on the local filesystem. Directory '/tmp' appears to be on the local filesystem. however the checkpoint "directory" that

Re: incorrect message that path appears to be local

2016-10-12 Thread Sean Owen

I'm not sure this is applied consistently across Spark, but I'm dealing with another change now where an unqualified path is assumed to be a local file. The method Utils.resolvePath implements this logic and is used several places. Therefore I think this is probably intended behavior and you can wr

Memory leak warnings in Spark 2.0.1

2016-10-12 Thread vonnagy

I am getting excessive memory leak warnings when running multiple mapping and aggregations and using DataSets. Is there anything I should be looking for to resolve this or is this a known issue? WARN [Executor task launch worker-0] org.apache.spark.memory.TaskMemoryManager - leak 16.3 MB memory f

Re: Python Spark Improvements (forked from Spark Improvement Proposals)

2016-10-12 Thread msukmanowsky

As very heavy Spark users at Parse.ly, I just wanted to give a +1 to all of the issues raised by Holden and Ricardo. I'm also giving a talk at PyCon Canada on PySpark https://2016.pycon.ca/en/schedule/096-mike-sukmanowsky/. Being a Python shop, we were extremely pleased to learn about PySpark a fe

Re: Spark Improvement Proposals

2016-10-12 Thread kant kodali

Some of you guys may have already seen this but in case if you haven't you may want to check it out. http://www.slideshare.net/sbaltagi/flink-vs-spark On Tue, Oct 11, 2016 at 1:57 PM, Ryan Blue wrote: > I don't think we will have trouble with whatever rule that is adopted for > accepting prop

Mark DataFrame/Dataset APIs stable

2016-10-12 Thread Reynold Xin

I took a look at all the public APIs we expose in o.a.spark.sql tonight, and realized we still have a large number of APIs that are marked experimental. Most of these haven't really changed, except in 2.0 we merged DataFrame and Dataset. I think it's long overdue to mark them stable. I'm tracking

RFC / PRD: new executor & node blacklist mechanism (SPARK-8425)

`Project` not preserving child partitioning ?

Re: StructuredStreaming Custom Sinks (motivated by Structured Streaming Machine Learning)

Re: `Project` not preserving child partitioning ?

Re: `Project` not preserving child partitioning ?

incorrect message that path appears to be local

Re: incorrect message that path appears to be local

Memory leak warnings in Spark 2.0.1

Re: Python Spark Improvements (forked from Spark Improvement Proposals)

Re: Spark Improvement Proposals

Mark DataFrame/Dataset APIs stable

11 matches

Site Navigation

Mail list logo

Footer information