Hi Ovidiu,
putting the CompactingHashTable aside, all data structures and algorithms
that use managed memory can spill to disk if data exceeds memory capacity.
It was a conscious choice to not let the CompactingHashTable spill. Once
the solution set hash table is spilled, (parts of) the hash tabl
Hi,
Regarding the solution set going out of memory, I would like an issue to be
filled against it.
Looking into code for CompactingHashTable I see
The hash table is internally divided into two parts: The hash index, and the
partition buffers that store the actual records. When records are inse
Correction: successfully CC I am running is on top of your friend, Spark :)
Best,
Ovidiu
> On 14 Mar 2016, at 20:38, Ovidiu-Cristian MARCU
> wrote:
>
> Yes, largely different. I was expecting for the solution set to be spillable.
> This is somehow very hard limitation, the layout of the data ma
Yes, largely different. I was expecting for the solution set to be spillable.
This is somehow very hard limitation, the layout of the data makes the
difference.
By contract, I am able to run successfully CC on the synthetic data but RDDs
are persisted in memory or on disk.
Best,
Ovidiu
> On 14
Probably the limitation is that the number of keys is different in the
real and the synthetic data set respectively. Can you confirm this?
The solution set for delta iterations is currently implemented as an
in-memory hash table that works on managed memory segments, but is not
spillable.
– Ufuk
This problem is surprising as I was able to run PR and CC on a larger graph
(2bil edges) but with this synthetic graph (1bil edges groups of 10) I ran out
of memory; regarding configuration (memory and parallelism, other internals) I
was using the same.
There is some limitation somewhere I will
Hi,
I understand the confusion. So far, I did not run into the problem, but
I think this needs to be adressed as all our graph processing
abstractions are implemented on top of the delta iteration.
According to the previous mailing list discussion, the problem is with
the solution set and it
Hi Ovidiu,
this option won't fix the problem if your system doesn't have enough memory
:)
It only defines whether the solution set is kept in managed memory or not.
For more iteration configuration options, take a look at the Gelly
documentation [1].
-Vasia.
[1]:
https://ci.apache.org/projects/f
Thank you for this alternative.
I don’t understand how the workaround will fix this on systems with limited
memory and maybe larger graph.
Running Connected Components on the same graph gives the same problem.
IterationHead(Unnamed Delta Iteration)(82/88) switched to FAILED
java.lang.RuntimeExce
Hi
I think this is the same issue we had before on the list [1]. Stephan
recommended the following workaround:
A possible workaround is to use the option "setSolutionSetUnmanaged(true)"
on the iteration. That will eliminate the fragmentation issue, at least.
Unfortunately, you cannot set th
Hi,
While running PageRank on a synthetic graph I run into this problem:
Any advice on how should I proceed to overcome this memory issue?
IterationHead(Vertex-centric iteration
(org.apache.flink.graph.library.PageRank$VertexRankUpdater@7712cae0 |
org.apache.flink.graph.library.PageRank$RankMe
11 matches
Mail list logo