Hey guys,
After rebuilding from the master branch this morning, I’ve started to see these
errors that I’ve never gotten before while running connected components. Anyone
seen this before?
14/09/10 20:38:53 INFO collection.ExternalSorter: Thread 87 spilling in-memory
batch of 1020 MB to disk (1
Hey all,
I’m trying to run connected components in graphx on about 400GB of data on 50
m3.xlarge nodes on emr. I keep getting java.nio.channels.CancelledKeyException
when it gets to "mapPartitions at VertexRDD.scala:347”. I haven’t been able to
find much about this online, and nothing that seem
On Jul 30, 2014, at 4:39 PM, Ankur Dave wrote:
> Jeffrey Picard writes:
>> I tried unpersisting the edges and vertices of the graph by hand, then
>> persisting the graph with persist(StorageLevel.MEMORY_AND_DISK). I still see
>> the same behavior in connected components
On Jul 30, 2014, at 5:18 AM, Ankur Dave wrote:
> Jeffrey Picard writes:
>> As the program runs I’m seeing each iteration take longer and longer to
>> complete, this seems counter intuitive to me, especially since I am seeing
>> the shuffle read/write amounts decrease w
art files (stored on s3) and it finishes in
about 12 minutes, but with all the data I’ve let it run up to 4 hours and it
still doesn’t complete. Does anyone have ideas for approaches to trouble
shooting this, spark parameters that might need to be tuned, etc?
Best Regards,
Jeffrey Picard