I think it's cool that the Mesos team did a survey of usage and published
the aggregate results. It would be cool to do a survey for the Spark
project and publish the results on the Spark website like the Mesos team
did.
-- Forwarded message --
From: "Dave Lester"
Date: Jun 24, 201
Due to SPARK-2245, you can not use count to materialize VertexRDD. That
actually materialize PartitionRDD, so checkpoint for VertexRDD won't work.
I'll trying to fix that right now.
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Checkpointed-RDD-still
This would be really useful. Especially for Shark where shift of
partitioning effects all subsequent queries unless task scheduling time
beats spark.locality.wait. Can cause overall low performance for all
subsequent tasks.
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur
Do not call collect as that will perform materialization as well as
transfer of data to driver (might actually cause driver to fail if the data
is huge). You have to materialize the RDD in some way(call save, count,
collect).
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@may
Hi Matei, thanks for the comments.
On Mon, Jun 23, 2014 at 7:58 PM, Matei Zaharia wrote:
> When we did the configuration pull request, we actually avoided having a big
> list of defaults in one class file, because this creates a file that all the
> components in the project depend on. For examp