Hi to all, I’m aware there are a few threads on this, but I haven’t been able to solve an issue I am seeing and hoped someone can help. I’m trying to run the following:
val connectedNetwork = new org.apache.flink.api.scala.DataSet[Vertex[Long, Long]]( Graph.fromTuple2DataSet(inputEdges, vertexInitialiser, env) .run(new ConnectedComponents[Long, NullValue](100))) And hitting the error: java.lang.RuntimeException: Memory ran out. numPartitions: 32 minPartition: 8 maxPartition: 8 number of overflow segments: 122 bucketSize: 206 Overall memory: 19365888 Partition memory: 8388608 at org.apache.flink.runtime.operators.hash.CompactingHashTable.getNextBuffer(CompactingHashTable.java:753) at org.apache.flink.runtime.operators.hash.CompactingHashTable.insertBucketEntryFromStart(CompactingHashTable.java:546) at org.apache.flink.runtime.operators.hash.CompactingHashTable.insertOrReplaceRecord(CompactingHashTable.java:423) at org.apache.flink.runtime.operators.hash.CompactingHashTable.buildTableWithUniqueKey(CompactingHashTable.java:325) at org.apache.flink.runtime.iterative.task.IterationHeadTask.readInitialSolutionSet(IterationHeadTask.java:212) at org.apache.flink.runtime.iterative.task.IterationHeadTask.run(IterationHeadTask.java:273) at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:345) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) at java.lang.Thread.run(Unknown Source) I’m running Flink 1.0.3 on windows 10 using start-local.bat. I have Xmx set to 6500MB, 8 workers, parallelism 8 and other memory settings left at default. The inputEdges dataset contains 141MB of Long,Long pairs (which is around 6 million edges). ParentID is unique and always negative, ChildID is non-unique and always positive (simulating a bipartite graph) An example few rows: -91498683401,1738 -135344401,5370 -100260517801,7970 -154352186001,12311 -160265532002,12826 The vast majority of the childIds are actually unique, and the most popular ID only occurs 10 times. VertexInitialiser just sets the vertex value to the id. Hopefully this is just a memory setting I’m not seeing for the hashTable as it dies almost instantly, I don’t think it gets very far into the dataset. I understand that the CompactingHashTable cannot spill, but I’d be surprised if it needed to at these low volumes. Many thanks for any help! Rob -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Memory-ran-out-error-when-running-connected-components-tp6888.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.