Some new features are about to land in spark to improve Spark's ability to
handle bad executors and nodes. These are some significant changes, and
we'd like to gather more input from the community about it, especially
folks that use *large clusters*.
We've spent a lot of time discussing the right
See
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala#L80
Project operator preserves child's sort ordering but for output
partitioning, it does not. I don't see any way projection would alter the
partitioning of the chil
On Tue, Oct 11, 2016 at 10:57 AM, Reynold Xin wrote:
>
> On Tue, Oct 11, 2016 at 10:55 AM, Michael Armbrust > wrote:
>
>> *Complex event processing and state management:* Several groups I've
>>> talked to want to run a large number (tens or hundreds of thousands now,
>>> millions in the near fut
It actually does -- but do it through a really weird way.
UnaryNodeExec actually defines:
trait UnaryExecNode extends SparkPlan {
def child: SparkPlan
override final def children: Seq[SparkPlan] = child :: Nil
override def outputPartitioning: Partitioning = child.outputPartitioning
}
I
Sure :)
Thanks,
Tejas
On Wed, Oct 12, 2016 at 11:26 AM, Reynold Xin wrote:
> It actually does -- but do it through a really weird way.
>
> UnaryNodeExec actually defines:
>
> trait UnaryExecNode extends SparkPlan {
> def child: SparkPlan
>
> override final def children: Seq[SparkPlan] = chi
i see this warning when running jobs on cluster:
2016-10-12 14:46:47 WARN spark.SparkContext: Spark is not running in local
mode, therefore the checkpoint directory must not be on the local
filesystem. Directory '/tmp' appears to be on the local filesystem.
however the checkpoint "directory" that
I'm not sure this is applied consistently across Spark, but I'm dealing
with another change now where an unqualified path is assumed to be a local
file. The method Utils.resolvePath implements this logic and is used
several places. Therefore I think this is probably intended behavior and
you can wr
I am getting excessive memory leak warnings when running multiple mapping and
aggregations and using DataSets. Is there anything I should be looking for
to resolve this or is this a known issue?
WARN [Executor task launch worker-0]
org.apache.spark.memory.TaskMemoryManager - leak 16.3 MB memory f
As very heavy Spark users at Parse.ly, I just wanted to give a +1 to all of
the issues raised by Holden and Ricardo. I'm also giving a talk at PyCon
Canada on PySpark https://2016.pycon.ca/en/schedule/096-mike-sukmanowsky/.
Being a Python shop, we were extremely pleased to learn about PySpark a fe
Some of you guys may have already seen this but in case if you haven't you
may want to check it out.
http://www.slideshare.net/sbaltagi/flink-vs-spark
On Tue, Oct 11, 2016 at 1:57 PM, Ryan Blue
wrote:
> I don't think we will have trouble with whatever rule that is adopted for
> accepting prop
I took a look at all the public APIs we expose in o.a.spark.sql tonight,
and realized we still have a large number of APIs that are marked
experimental. Most of these haven't really changed, except in 2.0 we merged
DataFrame and Dataset. I think it's long overdue to mark them stable.
I'm tracking
11 matches
Mail list logo