java.lang.ClassCastException: java.lang.Long cannot be cast to scala.Tuple2

2014-09-10 Thread Jeffrey Picard
Hey guys, After rebuilding from the master branch this morning, I’ve started to see these errors that I’ve never gotten before while running connected components. Anyone seen this before? 14/09/10 20:38:53 INFO collection.ExternalSorter: Thread 87 spilling in-memory batch of 1020 MB to disk (1

java.nio.channels.CancelledKeyException in Graphx Connected Components

2014-08-18 Thread Jeffrey Picard
Hey all, I’m trying to run connected components in graphx on about 400GB of data on 50 m3.xlarge nodes on emr. I keep getting java.nio.channels.CancelledKeyException when it gets to "mapPartitions at VertexRDD.scala:347”. I haven’t been able to find much about this online, and nothing that seem

Re: GraphX Connected Components

2014-07-30 Thread Jeffrey Picard
On Jul 30, 2014, at 4:39 PM, Ankur Dave wrote: > Jeffrey Picard writes: >> I tried unpersisting the edges and vertices of the graph by hand, then >> persisting the graph with persist(StorageLevel.MEMORY_AND_DISK). I still see >> the same behavior in connected components

Re: GraphX Connected Components

2014-07-30 Thread Jeffrey Picard
On Jul 30, 2014, at 5:18 AM, Ankur Dave wrote: > Jeffrey Picard writes: >> As the program runs I’m seeing each iteration take longer and longer to >> complete, this seems counter intuitive to me, especially since I am seeing >> the shuffle read/write amounts decrease w

GraphX Connected Components

2014-07-29 Thread Jeffrey Picard
art files (stored on s3) and it finishes in about 12 minutes, but with all the data I’ve let it run up to 4 hours and it still doesn’t complete. Does anyone have ideas for approaches to trouble shooting this, spark parameters that might need to be tuned, etc? Best Regards, Jeffrey Picard