"Memory ran out" error when running connected components

Arkay Fri, 13 May 2016 02:58:40 -0700

Hi to all,

I’m aware there are a few threads on this, but I haven’t been able to solve
an issue I am seeing and hoped someone can help.  I’m trying to run the
following:


val connectedNetwork = new org.apache.flink.api.scala.DataSet[Vertex[Long,
Long]](
  Graph.fromTuple2DataSet(inputEdges, vertexInitialiser, env)
    .run(new ConnectedComponents[Long, NullValue](100)))

And hitting the error:

java.lang.RuntimeException: Memory ran out. numPartitions: 32 minPartition:
8 maxPartition: 8 number of overflow segments: 122 bucketSize: 206 Overall
memory: 19365888 Partition memory: 8388608
         at
org.apache.flink.runtime.operators.hash.CompactingHashTable.getNextBuffer(CompactingHashTable.java:753)
         at
org.apache.flink.runtime.operators.hash.CompactingHashTable.insertBucketEntryFromStart(CompactingHashTable.java:546)
         at
org.apache.flink.runtime.operators.hash.CompactingHashTable.insertOrReplaceRecord(CompactingHashTable.java:423)
         at
org.apache.flink.runtime.operators.hash.CompactingHashTable.buildTableWithUniqueKey(CompactingHashTable.java:325)
         at
org.apache.flink.runtime.iterative.task.IterationHeadTask.readInitialSolutionSet(IterationHeadTask.java:212)
         at
org.apache.flink.runtime.iterative.task.IterationHeadTask.run(IterationHeadTask.java:273)
         at
org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:345)
         at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
         at java.lang.Thread.run(Unknown Source)

I’m running Flink 1.0.3 on windows 10 using start-local.bat.  I have Xmx set
to 6500MB, 8 workers, parallelism 8 and other memory settings left at
default.

The inputEdges dataset contains 141MB of Long,Long pairs (which is around 6
million edges).  ParentID is unique and always negative, ChildID is
non-unique and always positive (simulating a bipartite graph)

An example few rows:
-91498683401,1738
-135344401,5370
-100260517801,7970
-154352186001,12311
-160265532002,12826

The vast majority of the childIds are actually unique, and the most popular
ID only occurs 10 times.

VertexInitialiser just sets the vertex value to the id.

Hopefully this is just a memory setting I’m not seeing for the hashTable as
it dies almost instantly,  I don’t think it gets very far into the dataset. 
I understand that the CompactingHashTable cannot spill, but I’d be surprised
if it needed to at these low volumes.

Many thanks for any help!

Rob 




--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Memory-ran-out-error-when-running-connected-components-tp6888.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.

"Memory ran out" error when running connected components

Reply via email to