+ Joey

We think this is worth doing. Are you interested in submitting a pull
request?


On Sat, Feb 20, 2016 at 8:05 PM ahaider3 <ahaid...@hawk.iit.edu> wrote:

> Hi,
> I have been looking through the GraphX source code, dissecting the reason
> for its high memory consumption compared to the on-disk size of the graph.
> I
> have found that there may be room to reduce the memory footprint of the
> graph structures. I think the biggest savings can come from the localSrcIds
> and localDstIds in EdgePartitions.
>
> In particular, instead of storing both a source and destination local ID
> for
> each edge, we could store only the destination id. For example after
> sorting
> edges by global source id, we can map each of the source vertices first to
> local values followed by unmapped global destination ids. This would make
> localSrcIds sorted starting from 0 to n, where n is the number of distinct
> global source ids. Then instead of actually storing the local source id for
> each edge, we can store an array of size n, with each element storing an
> index into localDstIds.  From my understanding, this would also eliminate
> the need for storing an index for indexed scanning, since each element in
> localSrcIds would be the start of a cluster. From some extensive testing,
> this along with some delta encoding strategies on localDstIds and the
> mapping structures can reduce memory consumption of the graph by nearly
> half.
>
> However, I am not entirely sure if there is any reason for storing both
> localSrcIds and localDstIds for each edge in terms of integration of future
> functionalities, such as graph mutations. I noticed there was another post
> similar to this one as well, but it had not replies.
>
> The idea is quite similar to  Netflix graph library
> <https://github.com/Netflix/netflix-graph>   and would be happy to open a
> jira on this issue with partial improvements. But, I may not be completely
> correct with my thinking!
>
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Using-Encoding-to-reduce-GraphX-s-static-graph-memory-consumption-tp16373.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>

Reply via email to