Yes, sounds good. I can submit the pull request.
On 22 Feb 2016 00:35, "Reynold Xin" <r...@databricks.com> wrote:

> + Joey
>
> We think this is worth doing. Are you interested in submitting a pull
> request?
>
>
> On Sat, Feb 20, 2016 at 8:05 PM ahaider3 <ahaid...@hawk.iit.edu> wrote:
>
>> Hi,
>> I have been looking through the GraphX source code, dissecting the reason
>> for its high memory consumption compared to the on-disk size of the
>> graph. I
>> have found that there may be room to reduce the memory footprint of the
>> graph structures. I think the biggest savings can come from the
>> localSrcIds
>> and localDstIds in EdgePartitions.
>>
>> In particular, instead of storing both a source and destination local ID
>> for
>> each edge, we could store only the destination id. For example after
>> sorting
>> edges by global source id, we can map each of the source vertices first to
>> local values followed by unmapped global destination ids. This would make
>> localSrcIds sorted starting from 0 to n, where n is the number of distinct
>> global source ids. Then instead of actually storing the local source id
>> for
>> each edge, we can store an array of size n, with each element storing an
>> index into localDstIds.  From my understanding, this would also eliminate
>> the need for storing an index for indexed scanning, since each element in
>> localSrcIds would be the start of a cluster. From some extensive testing,
>> this along with some delta encoding strategies on localDstIds and the
>> mapping structures can reduce memory consumption of the graph by nearly
>> half.
>>
>> However, I am not entirely sure if there is any reason for storing both
>> localSrcIds and localDstIds for each edge in terms of integration of
>> future
>> functionalities, such as graph mutations. I noticed there was another post
>> similar to this one as well, but it had not replies.
>>
>> The idea is quite similar to  Netflix graph library
>> <https://github.com/Netflix/netflix-graph>   and would be happy to open a
>> jira on this issue with partial improvements. But, I may not be completely
>> correct with my thinking!
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-developers-list.1001551.n3.nabble.com/Using-Encoding-to-reduce-GraphX-s-static-graph-memory-consumption-tp16373.html
>> Sent from the Apache Spark Developers List mailing list archive at
>> Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
>>

Reply via email to