Re: "Memory ran out" error when running connected components

Vasiliki Kalavri Fri, 13 May 2016 04:50:56 -0700

Hi Rob,


On 13 May 2016 at 11:22, Arkay <robkee...@gmail.com> wrote:

> Hi to all,
>
> I’m aware there are a few threads on this, but I haven’t been able to solve
> an issue I am seeing and hoped someone can help.  I’m trying to run the
> following:
>
> val connectedNetwork = new org.apache.flink.api.scala.DataSet[Vertex[Long,
> Long]](
>   Graph.fromTuple2DataSet(inputEdges, vertexInitialiser, env)
>     .run(new ConnectedComponents[Long, NullValue](100)))
>
> And hitting the error:
>
> java.lang.RuntimeException: Memory ran out. numPartitions: 32 minPartition:
> 8 maxPartition: 8 number of overflow segments: 122 bucketSize: 206 Overall
> memory: 19365888 Partition memory: 8388608
>          at
>
> org.apache.flink.runtime.operators.hash.CompactingHashTable.getNextBuffer(CompactingHashTable.java:753)
>          at
>
> org.apache.flink.runtime.operators.hash.CompactingHashTable.insertBucketEntryFromStart(CompactingHashTable.java:546)
>          at
>
> org.apache.flink.runtime.operators.hash.CompactingHashTable.insertOrReplaceRecord(CompactingHashTable.java:423)
>          at
>
> org.apache.flink.runtime.operators.hash.CompactingHashTable.buildTableWithUniqueKey(CompactingHashTable.java:325)
>          at
>
> org.apache.flink.runtime.iterative.task.IterationHeadTask.readInitialSolutionSet(IterationHeadTask.java:212)
>          at
>
> org.apache.flink.runtime.iterative.task.IterationHeadTask.run(IterationHeadTask.java:273)
>          at
> org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:345)
>          at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
>          at java.lang.Thread.run(Unknown Source)
>
> I’m running Flink 1.0.3 on windows 10 using start-local.bat.  I have Xmx
> set
> to 6500MB, 8 workers, parallelism 8 and other memory settings left at
> default.
>

The start-local script will start a single JobManager and TaskManager.
What do you mean by 8 workers? Have you set the numberOfTaskSlots to 8? To
give all available memory to your TaskManager, you should set the
"taskmanager.heap.mb" configuration option in flink-conf.yaml. Can you open
the Flink dashboard at http://localhost:8081/ and check the configuration
of your taskmanager?

Cheers,
-Vasia.


> The inputEdges dataset contains 141MB of Long,Long pairs (which is around 6
> million edges).  ParentID is unique and always negative, ChildID is
> non-unique and always positive (simulating a bipartite graph)
>
> An example few rows:
> -91498683401,1738
> -135344401,5370
> -100260517801,7970
> -154352186001,12311
> -160265532002,12826
>
> The vast majority of the childIds are actually unique, and the most popular
> ID only occurs 10 times.
>
> VertexInitialiser just sets the vertex value to the id.
>
> Hopefully this is just a memory setting I’m not seeing for the hashTable as
> it dies almost instantly,  I don’t think it gets very far into the dataset.
> I understand that the CompactingHashTable cannot spill, but I’d be
> surprised
> if it needed to at these low volumes.
>
> Many thanks for any help!
>
> Rob
>
>
>
>
> --
> View this message in context:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Memory-ran-out-error-when-running-connected-components-tp6888.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive
> at Nabble.com.
>

Re: "Memory ran out" error when running connected components

Reply via email to