Re: Current status of Sparrow

2014-06-23 Thread Kay Ousterhout
Hi Liquan, Sparrow is not currently integrated into the Spark distribution, so if you'd like to use Spark with Sparrow, you need to use a forked version of Spark (https://github.com/kayousterhout/spark/tree/sparrow). This version of Spark was forked off an older version of Spark so some work will

Re: Checkpointed RDD still causing StackOverflow

2014-06-23 Thread Xiangrui Meng
Calling checkpoint() alone doesn't cut the lineage. It only marks the RDD as to be checkpointed. The lineage is cut after the first time this RDD is materialized. You see StackOverflow becaure the lineage is still there. -Xiangrui On Sun, Jun 22, 2014 at 6:37 PM, dash wrote: > Hi Xiangrui, > > Ac

Re: Problems with Pyspark + Dill tests

2014-06-23 Thread Mark Baker
On Thu, Jun 19, 2014 at 3:56 PM, Josh Rosen wrote: > Thanks for helping with the Dill integration; I had some early first > attempts, but had to set them aside when I got busy with some other work. > > Just to bring everyone up to speed regarding context: > There are some objects that PySpark’s `

RFC: [SPARK-529] Create constants for known config variables.

2014-06-23 Thread Marcelo Vanzin
I started with some code to implement an idea I had for SPARK-529, and before going much further (since it's a large and kinda boring change) I'd like to get some feedback from people. Current code it at: https://github.com/vanzin/spark/tree/SPARK-529 There are still some parts I haven't fully fl

Re: RFC: [SPARK-529] Create constants for known config variables.

2014-06-23 Thread Matei Zaharia
Hey Marcelo, When we did the configuration pull request, we actually avoided having a big list of defaults in one class file, because this creates a file that all the components in the project depend on. For example, since we have some settings specific to streaming and the REPL, do we want tho

Re: [jira] [Created] (SPARK-1867) Spark Documentation Error causes java.lang.IllegalStateException: unread block data

2014-06-23 Thread Reynold Xin
Mridul, Can you comment a little bit more on this issue? We are running into the same stack trace but not sure whether it is just different Spark versions on each cluster (doesn't seem likely) or a bug in Spark. Thanks. On Sat, May 17, 2014 at 4:41 AM, Mridul Muralidharan wrote: > I suspect

Re: [jira] [Created] (SPARK-1867) Spark Documentation Error causes java.lang.IllegalStateException: unread block data

2014-06-23 Thread Mridul Muralidharan
There are a few interacting issues here - and unfortunately I dont recall all of it (since this was fixed a few months back). >From memory though : a) With shuffle consolidation, data sent to remote node incorrectly includes data from partially constructed blocks - not just the request blocks. Act